SlideShare a Scribd company logo
MILAN 20/21.11.2015
Graphs are everywhere!
Distributed graph computing with Spark GraphX
Andrea Iacono
MILAN 20/21.11.2015 - Andrea Iacono
Agenda:
●
Graph definitions and usages
●
GraphX introduction
●
Pregel
●
Code examples
The main focus will be the programming model
The code is available at:
https://github.com/andreaiacono/TalkGraphX
MILAN 20/21.11.2015 - Andrea Iacono
A graph is a set of vertices and edges that connect them:
Graphs are used for modeling very different domains.
Edge
Verte
x
MILAN 20/21.11.2015 - Andrea Iacono
Network
s
MILAN 20/21.11.2015 - Andrea Iacono
Routing
MILAN 20/21.11.2015 - Andrea Iacono
Page Rank
MILAN 20/21.11.2015 - Andrea Iacono
Definitions
Undirected Directed
MILAN 20/21.11.2015 - Andrea Iacono
Definitions
Connected Disconnected
MILAN 20/21.11.2015 - Andrea Iacono
Definitions
K5
K2,3
Complete Bipartite (and complete)
MILAN 20/21.11.2015 - Andrea Iacono
Definitions
Cyclic Acyclic
MILAN 20/21.11.2015 - Andrea Iacono
Definitions
Multigraph Pseudograph
MILAN 20/21.11.2015 - Andrea Iacono
Definitions
An undirected acyclic connected graph is a tree!
MILAN 20/21.11.2015 - Andrea Iacono
What's wrong with MapReduce?
Every run of MapReduce reads from disk (e.g. HDFS) the initial data,
computes the results and then stores them on disk; since most
algorithms on graphs are iterative, this means that for every iteration
the whole data must be read and written from/to disk.
It's better to use a distributed dataflow framework
MILAN 20/21.11.2015 - Andrea Iacono
GraphX is a graph processing system
built on top of Apache Spark
“Graph processing systems represent graph structured data as a property
graph, which associates user-defined properties with each vertex and edge.”
“The Spark storage abstraction called Resilient Distributed Datasets (RDDs)
enables applications to keep data in memory, which is essential for iterative
graph algorithms.”
“RDDs permit user-defined data partitioning, and the execution engine can
exploit this to co-partition RDDs and co-schedule tasks to avoid data
movement. This is essential for encoding partitioned graphs.”
Excerpt from GraphX: Graph Processing in a Distributed Dataflow Framework
https://amplab.cs.berkeley.edu/wp-content/uploads/2014/09/graphx.pdf
MILAN 20/21.11.2015 - Andrea Iacono
GraphX / Spark software stack
(image source: Spark site)
MILAN 20/21.11.2015 - Andrea Iacono
Graph Databases
●
Storage
●
Query Language
●
Transactions
●
Examples:
●
Neo4j
●
OrientDB
●
Titan
●
APIs for traversing and
processing
●
Better performance
(in-memory data)
●
Examples:
●
GraphX
●
Giraphe
●
GraphLab
Graph Processing
Systems
MILAN 20/21.11.2015 - Andrea Iacono
Pregel
is a computational model designed by Google
(https://kowshik.github.io/JPregel/pregel_paper.pdf)
It consists of a sequence of supersteps until termination. In each superstep,
every vertex can:
●
modify its state or the one of any of its neighbours
●
receive the messages sent to it during the previous superstep
●
send messages to its neighbours (that will be received in next superstep)
●
vote to halt
When a node votes to halt, it goes to inactive state; if in a later superstep it
receives a message, the framework will awake it changing its state to active.
When all the nodes have voted to halt, the computation stops; otherwise it can be
set a maximum number of iteration.
Edges don't have any computation.
When writing algorithms, you have to think as a vertex.
MILAN 20/21.11.2015 - Andrea Iacono
Pregel sample
Image source: Pregel paper
MILAN 20/21.11.2015 - Andrea Iacono
GraphX implementation of Pregel
GraphX uses three functions for implementing Pregel:
●
vprog: the vertex program computed for each vertex that receives the
incoming message and computes a new vertex value
●
sendMsg: the function used for sending messages to other vertices
●
mergeMsg: a function that takes two incoming messages and merges
them into a single message
Unlike Google's Pregel, GraphX implementation of Pregel:
●
leave the message construction out of the vertex-program, so to have
a more efficient distributed execution
●
permits access to both vertices attributes of an edge while building the
messages
●
contraints sending messages to graph structure (only to neighbours)
MILAN 20/21.11.2015 - Andrea Iacono
GraphX Pregel communication diagram
MILAN 20/21.11.2015 - Andrea Iacono
GraphX is well suited for algorithms that:
●
respect the neighborhood structure
GraphX is NOT well suited for algorithms that:
●
need iteration among distant vertices
●
change the structure of the graph
When to use GraphX
MILAN 20/21.11.2015 - Andrea Iacono
Algorithms out of the
box:
(as of Spark v1.5.1)
- Connected Components
- Label Propagation
- PageRank
- SVD++
- Shortest Paths
- Strongly Connected Components
- Triangle Count
MILAN 20/21.11.2015 - Andrea Iacono
Now some code!
MILAN 20/21.11.2015 - Andrea Iacono
Questions & Answers
MILAN 20/21.11.2015
Andrea Iacono
The code is available at:
https://github.com/andreaiacono/TalkGraphX
MILAN 20/21.11.2015 - Andrea Iacono
Leave your feedback on Joind.in!
https://m.joind.in/event/codemotion-milan-2015

More Related Content

What's hot (20)

PDF
GraphX is the blue ocean for scala engineers @ Scala Matsuri 2014
鉄平 土佐
 
PPTX
Apache Spark GraphX & GraphFrame Synthetic ID Fraud Use Case
Mo Patel
 
PPT
Graph Analytics for big data
Sigmoid
 
PDF
Neo4j-Databridge: Enterprise-scale ETL for Neo4j
GraphAware
 
PDF
Signals from outer space
GraphAware
 
PDF
GraphFrames: DataFrame-based graphs for Apache® Spark™
Databricks
 
PDF
Spark graphx
Carol McDonald
 
PDF
Extending Spark Graph for the Enterprise with Morpheus and Neo4j
Databricks
 
PDF
Apache Spark and the Emerging Technology Landscape for Big Data
Paco Nathan
 
PDF
GraphAware Framework Intro
Michal Bachman
 
PDF
Credit Fraud Prevention with Spark and Graph Analysis
Jen Aman
 
PDF
An excursion into Graph Analytics with Apache Spark GraphX
Krishna Sankar
 
PDF
New Directions for Spark in 2015 - Spark Summit East
Databricks
 
PDF
Graph-Powered Machine Learning
GraphAware
 
PPTX
A full Machine learning pipeline in Scikit-learn vs in scala-Spark: pros and ...
Jose Quesada (hiring)
 
PPTX
AMP Camp 5 Intro
jeykottalam
 
PDF
Power of Polyglot Search
Janos Szendi-Varga
 
PPTX
Gephi, Graphx, and Giraph
Doug Needham
 
PDF
Strata 2015 Data Preview: Spark, Data Visualization, YARN, and More
Paco Nathan
 
PDF
Congressional PageRank: Graph Analytics of US Congress With Neo4j
William Lyon
 
GraphX is the blue ocean for scala engineers @ Scala Matsuri 2014
鉄平 土佐
 
Apache Spark GraphX & GraphFrame Synthetic ID Fraud Use Case
Mo Patel
 
Graph Analytics for big data
Sigmoid
 
Neo4j-Databridge: Enterprise-scale ETL for Neo4j
GraphAware
 
Signals from outer space
GraphAware
 
GraphFrames: DataFrame-based graphs for Apache® Spark™
Databricks
 
Spark graphx
Carol McDonald
 
Extending Spark Graph for the Enterprise with Morpheus and Neo4j
Databricks
 
Apache Spark and the Emerging Technology Landscape for Big Data
Paco Nathan
 
GraphAware Framework Intro
Michal Bachman
 
Credit Fraud Prevention with Spark and Graph Analysis
Jen Aman
 
An excursion into Graph Analytics with Apache Spark GraphX
Krishna Sankar
 
New Directions for Spark in 2015 - Spark Summit East
Databricks
 
Graph-Powered Machine Learning
GraphAware
 
A full Machine learning pipeline in Scikit-learn vs in scala-Spark: pros and ...
Jose Quesada (hiring)
 
AMP Camp 5 Intro
jeykottalam
 
Power of Polyglot Search
Janos Szendi-Varga
 
Gephi, Graphx, and Giraph
Doug Needham
 
Strata 2015 Data Preview: Spark, Data Visualization, YARN, and More
Paco Nathan
 
Congressional PageRank: Graph Analytics of US Congress With Neo4j
William Lyon
 

Viewers also liked (20)

PDF
Real time and reliable processing with Apache Storm
Andrea Iacono
 
PDF
Graph Processing with Apache TinkerPop
Jason Plurad
 
PDF
Quantum Processes in Graph Computing
Marko Rodriguez
 
PDF
Titan: The Rise of Big Graph Data
Marko Rodriguez
 
PDF
Titan: Big Graph Data with Cassandra
Matthias Broecheler
 
PPTX
Spark Concepts - Spark SQL, Graphx, Streaming
Petr Zapletal
 
PDF
Faunus: Graph Analytics Engine
Marko Rodriguez
 
PDF
Interactive Graph Analytics with Spark-(Daniel Darabos, Lynx Analytics)
Spark Summit
 
PPTX
Using spark for timeseries graph analytics
Sigmoid
 
PDF
Building a Graph of all US Businesses Using Spark Technologies by Alexis Roos
Spark Summit
 
PPTX
Neo, Titan & Cassandra
johnrjenson
 
PDF
Titan: Scaling Graphs and TinkerPop3
Matthias Broecheler
 
PPT
Big Graph Analytics on Neo4j with Apache Spark
Kenny Bastani
 
PDF
Graph processing - Powergraph and GraphX
Amir Payberah
 
PDF
Graph Processing with Titan and Scylla
Jason Plurad
 
PDF
The Pregel Programming Model with Spark GraphX
Andrea Iacono
 
PPT
Graph Processing Applications @ HUG
Praveen Sripati
 
PDF
Introductory Keynote at Hadoop Workshop by Ospcon (2014)
Andrei Nikolaenko
 
PPTX
Improving personalized recommendations through temporal overlapping community...
Mani kandan
 
PDF
Graph Sample and Hold: A Framework for Big Graph Analytics
Nesreen K. Ahmed
 
Real time and reliable processing with Apache Storm
Andrea Iacono
 
Graph Processing with Apache TinkerPop
Jason Plurad
 
Quantum Processes in Graph Computing
Marko Rodriguez
 
Titan: The Rise of Big Graph Data
Marko Rodriguez
 
Titan: Big Graph Data with Cassandra
Matthias Broecheler
 
Spark Concepts - Spark SQL, Graphx, Streaming
Petr Zapletal
 
Faunus: Graph Analytics Engine
Marko Rodriguez
 
Interactive Graph Analytics with Spark-(Daniel Darabos, Lynx Analytics)
Spark Summit
 
Using spark for timeseries graph analytics
Sigmoid
 
Building a Graph of all US Businesses Using Spark Technologies by Alexis Roos
Spark Summit
 
Neo, Titan & Cassandra
johnrjenson
 
Titan: Scaling Graphs and TinkerPop3
Matthias Broecheler
 
Big Graph Analytics on Neo4j with Apache Spark
Kenny Bastani
 
Graph processing - Powergraph and GraphX
Amir Payberah
 
Graph Processing with Titan and Scylla
Jason Plurad
 
The Pregel Programming Model with Spark GraphX
Andrea Iacono
 
Graph Processing Applications @ HUG
Praveen Sripati
 
Introductory Keynote at Hadoop Workshop by Ospcon (2014)
Andrei Nikolaenko
 
Improving personalized recommendations through temporal overlapping community...
Mani kandan
 
Graph Sample and Hold: A Framework for Big Graph Analytics
Nesreen K. Ahmed
 
Ad

Similar to Graphs are everywhere! Distributed graph computing with Spark GraphX (20)

PDF
Andrea Iacono - Graphs are everywhere!
Codemotion
 
PPTX
Graphs in data structures are non-linear data structures made up of a finite ...
bhargavi804095
 
PDF
Advanced Data Science with Apache Spark-(Reza Zadeh, Stanford)
Spark Summit
 
PDF
ScaleGraph - A High-Performance Library for Billion-Scale Graph Analytics
Toyotaro Suzumura
 
PDF
Large scale graph processing
Harisankar H
 
PDF
Ling liu part 02:big graph processing
jins0618
 
PDF
GraphTech Ecosystem - part 2: Graph Analytics
Linkurious
 
PDF
Pregel: A System for Large-Scale Graph Processing
Chris Bunch
 
PPTX
Graph processing
yeahjs
 
PDF
Microservices, containers, and machine learning
Paco Nathan
 
PDF
Graph processing - Pregel
Amir Payberah
 
PDF
Python networkx library quick start guide
Universiti Technologi Malaysia (UTM)
 
PDF
Big Graph : Tools, Techniques, Issues, Challenges and Future Directions
csandit
 
PDF
BIG GRAPH: TOOLS, TECHNIQUES, ISSUES, CHALLENGES AND FUTURE DIRECTIONS
cscpconf
 
PPT
MHH_20Feb_2012111111111111111111111111111.ppt
BiHongPhc
 
PDF
Graph x pregel
Sigmoid
 
PDF
GraphX and Pregel - Apache Spark
Ashutosh Trivedi
 
PPTX
Pregel
Weiru Dai
 
PDF
Spark Meetup @ Netflix, 05/19/2015
Yves Raimond
 
PDF
Write Graph Algorithms Like a Boss Andrew Ray
Databricks
 
Andrea Iacono - Graphs are everywhere!
Codemotion
 
Graphs in data structures are non-linear data structures made up of a finite ...
bhargavi804095
 
Advanced Data Science with Apache Spark-(Reza Zadeh, Stanford)
Spark Summit
 
ScaleGraph - A High-Performance Library for Billion-Scale Graph Analytics
Toyotaro Suzumura
 
Large scale graph processing
Harisankar H
 
Ling liu part 02:big graph processing
jins0618
 
GraphTech Ecosystem - part 2: Graph Analytics
Linkurious
 
Pregel: A System for Large-Scale Graph Processing
Chris Bunch
 
Graph processing
yeahjs
 
Microservices, containers, and machine learning
Paco Nathan
 
Graph processing - Pregel
Amir Payberah
 
Python networkx library quick start guide
Universiti Technologi Malaysia (UTM)
 
Big Graph : Tools, Techniques, Issues, Challenges and Future Directions
csandit
 
BIG GRAPH: TOOLS, TECHNIQUES, ISSUES, CHALLENGES AND FUTURE DIRECTIONS
cscpconf
 
MHH_20Feb_2012111111111111111111111111111.ppt
BiHongPhc
 
Graph x pregel
Sigmoid
 
GraphX and Pregel - Apache Spark
Ashutosh Trivedi
 
Pregel
Weiru Dai
 
Spark Meetup @ Netflix, 05/19/2015
Yves Raimond
 
Write Graph Algorithms Like a Boss Andrew Ray
Databricks
 
Ad

Recently uploaded (20)

PPTX
ppt somu_Jarvis_AI_Assistant_presen.pptx
MohammedumarFarhan
 
PPTX
RESEARCH-FINAL-GROUP-3, about the final .pptx
gwapokoha1
 
PDF
Data science AI/Ml basics to learn .pdf
deokhushi04
 
PDF
CT-2-Ancient ancient accept-Criticism.pdf
DepartmentofEnglishC1
 
PPTX
Artificial intelligence Presentation1.pptx
SaritaMahajan5
 
PDF
Kafka Use Cases Real-World Applications
Accentfuture
 
PPTX
PPT2 W1L2.pptx.........................................
palicteronalyn26
 
PDF
TCU EVALUATION FACULTY TCU Taguig City 1st Semester 2017-2018
MELJUN CORTES
 
PDF
Prescriptive Process Monitoring Under Uncertainty and Resource Constraints: A...
Mahmoud Shoush
 
PPTX
Smart_Workplace_Assistant_Presentation (1).pptx
kiccha1703
 
PPTX
美国毕业证范本中华盛顿大学学位证书CWU学生卡购买
Taqyea
 
DOCX
brigada_PROGRAM_25.docx the boys white house
RonelNebrao
 
PPTX
Data Analytics using sparkabcdefghi.pptx
KarkuzhaliS3
 
PDF
624753984-Annex-A3-RPMS-Tool-for-Proficient-Teachers-SY-2024-2025.pdf
CristineGraceAcuyan
 
PDF
Predicting Titanic Survival Presentation
praxyfarhana
 
PPTX
Model Evaluation & Visualisation part of a series of intro modules for data ...
brandonlee626749
 
PDF
SaleServicereport and SaleServicereport
2251330007
 
PPTX
MENU-DRIVEN PROGRAM ON ARUNACHAL PRADESH.pptx
manvi200807
 
PDF
A Web Repository System for Data Mining in Drug Discovery
IJDKP
 
DOCX
COT Feb 19, 2025 DLLgvbbnnjjjjjj_Digestive System and its Functions_PISA_CBA....
kayemorales1105
 
ppt somu_Jarvis_AI_Assistant_presen.pptx
MohammedumarFarhan
 
RESEARCH-FINAL-GROUP-3, about the final .pptx
gwapokoha1
 
Data science AI/Ml basics to learn .pdf
deokhushi04
 
CT-2-Ancient ancient accept-Criticism.pdf
DepartmentofEnglishC1
 
Artificial intelligence Presentation1.pptx
SaritaMahajan5
 
Kafka Use Cases Real-World Applications
Accentfuture
 
PPT2 W1L2.pptx.........................................
palicteronalyn26
 
TCU EVALUATION FACULTY TCU Taguig City 1st Semester 2017-2018
MELJUN CORTES
 
Prescriptive Process Monitoring Under Uncertainty and Resource Constraints: A...
Mahmoud Shoush
 
Smart_Workplace_Assistant_Presentation (1).pptx
kiccha1703
 
美国毕业证范本中华盛顿大学学位证书CWU学生卡购买
Taqyea
 
brigada_PROGRAM_25.docx the boys white house
RonelNebrao
 
Data Analytics using sparkabcdefghi.pptx
KarkuzhaliS3
 
624753984-Annex-A3-RPMS-Tool-for-Proficient-Teachers-SY-2024-2025.pdf
CristineGraceAcuyan
 
Predicting Titanic Survival Presentation
praxyfarhana
 
Model Evaluation & Visualisation part of a series of intro modules for data ...
brandonlee626749
 
SaleServicereport and SaleServicereport
2251330007
 
MENU-DRIVEN PROGRAM ON ARUNACHAL PRADESH.pptx
manvi200807
 
A Web Repository System for Data Mining in Drug Discovery
IJDKP
 
COT Feb 19, 2025 DLLgvbbnnjjjjjj_Digestive System and its Functions_PISA_CBA....
kayemorales1105
 

Graphs are everywhere! Distributed graph computing with Spark GraphX

  • 1. MILAN 20/21.11.2015 Graphs are everywhere! Distributed graph computing with Spark GraphX Andrea Iacono
  • 2. MILAN 20/21.11.2015 - Andrea Iacono Agenda: ● Graph definitions and usages ● GraphX introduction ● Pregel ● Code examples The main focus will be the programming model The code is available at: https://github.com/andreaiacono/TalkGraphX
  • 3. MILAN 20/21.11.2015 - Andrea Iacono A graph is a set of vertices and edges that connect them: Graphs are used for modeling very different domains. Edge Verte x
  • 4. MILAN 20/21.11.2015 - Andrea Iacono Network s
  • 5. MILAN 20/21.11.2015 - Andrea Iacono Routing
  • 6. MILAN 20/21.11.2015 - Andrea Iacono Page Rank
  • 7. MILAN 20/21.11.2015 - Andrea Iacono Definitions Undirected Directed
  • 8. MILAN 20/21.11.2015 - Andrea Iacono Definitions Connected Disconnected
  • 9. MILAN 20/21.11.2015 - Andrea Iacono Definitions K5 K2,3 Complete Bipartite (and complete)
  • 10. MILAN 20/21.11.2015 - Andrea Iacono Definitions Cyclic Acyclic
  • 11. MILAN 20/21.11.2015 - Andrea Iacono Definitions Multigraph Pseudograph
  • 12. MILAN 20/21.11.2015 - Andrea Iacono Definitions An undirected acyclic connected graph is a tree!
  • 13. MILAN 20/21.11.2015 - Andrea Iacono What's wrong with MapReduce? Every run of MapReduce reads from disk (e.g. HDFS) the initial data, computes the results and then stores them on disk; since most algorithms on graphs are iterative, this means that for every iteration the whole data must be read and written from/to disk. It's better to use a distributed dataflow framework
  • 14. MILAN 20/21.11.2015 - Andrea Iacono GraphX is a graph processing system built on top of Apache Spark “Graph processing systems represent graph structured data as a property graph, which associates user-defined properties with each vertex and edge.” “The Spark storage abstraction called Resilient Distributed Datasets (RDDs) enables applications to keep data in memory, which is essential for iterative graph algorithms.” “RDDs permit user-defined data partitioning, and the execution engine can exploit this to co-partition RDDs and co-schedule tasks to avoid data movement. This is essential for encoding partitioned graphs.” Excerpt from GraphX: Graph Processing in a Distributed Dataflow Framework https://amplab.cs.berkeley.edu/wp-content/uploads/2014/09/graphx.pdf
  • 15. MILAN 20/21.11.2015 - Andrea Iacono GraphX / Spark software stack (image source: Spark site)
  • 16. MILAN 20/21.11.2015 - Andrea Iacono Graph Databases ● Storage ● Query Language ● Transactions ● Examples: ● Neo4j ● OrientDB ● Titan ● APIs for traversing and processing ● Better performance (in-memory data) ● Examples: ● GraphX ● Giraphe ● GraphLab Graph Processing Systems
  • 17. MILAN 20/21.11.2015 - Andrea Iacono Pregel is a computational model designed by Google (https://kowshik.github.io/JPregel/pregel_paper.pdf) It consists of a sequence of supersteps until termination. In each superstep, every vertex can: ● modify its state or the one of any of its neighbours ● receive the messages sent to it during the previous superstep ● send messages to its neighbours (that will be received in next superstep) ● vote to halt When a node votes to halt, it goes to inactive state; if in a later superstep it receives a message, the framework will awake it changing its state to active. When all the nodes have voted to halt, the computation stops; otherwise it can be set a maximum number of iteration. Edges don't have any computation. When writing algorithms, you have to think as a vertex.
  • 18. MILAN 20/21.11.2015 - Andrea Iacono Pregel sample Image source: Pregel paper
  • 19. MILAN 20/21.11.2015 - Andrea Iacono GraphX implementation of Pregel GraphX uses three functions for implementing Pregel: ● vprog: the vertex program computed for each vertex that receives the incoming message and computes a new vertex value ● sendMsg: the function used for sending messages to other vertices ● mergeMsg: a function that takes two incoming messages and merges them into a single message Unlike Google's Pregel, GraphX implementation of Pregel: ● leave the message construction out of the vertex-program, so to have a more efficient distributed execution ● permits access to both vertices attributes of an edge while building the messages ● contraints sending messages to graph structure (only to neighbours)
  • 20. MILAN 20/21.11.2015 - Andrea Iacono GraphX Pregel communication diagram
  • 21. MILAN 20/21.11.2015 - Andrea Iacono GraphX is well suited for algorithms that: ● respect the neighborhood structure GraphX is NOT well suited for algorithms that: ● need iteration among distant vertices ● change the structure of the graph When to use GraphX
  • 22. MILAN 20/21.11.2015 - Andrea Iacono Algorithms out of the box: (as of Spark v1.5.1) - Connected Components - Label Propagation - PageRank - SVD++ - Shortest Paths - Strongly Connected Components - Triangle Count
  • 23. MILAN 20/21.11.2015 - Andrea Iacono Now some code!
  • 24. MILAN 20/21.11.2015 - Andrea Iacono Questions & Answers
  • 25. MILAN 20/21.11.2015 Andrea Iacono The code is available at: https://github.com/andreaiacono/TalkGraphX
  • 26. MILAN 20/21.11.2015 - Andrea Iacono Leave your feedback on Joind.in! https://m.joind.in/event/codemotion-milan-2015

Editor's Notes

  • #3: Question to public: - Who knows what a graph is? - Who ever used it? - Who knows the most used algorithms? (BFS, DFS, Dijkstra) - Who knows Scala?
  • #4: Vertici e archi
  • #5: Conteggio dei triangoli x raggruppare Interesse commerciale x proposte mirate a gruppi con stessi interessi
  • #6: Vertici = incroci Archi = strade Algoritmo cammino minimo (Dijkstra), dove gli archi hanno più pesi: tipicamente distanza, traffico, pagamento di un pedaggio, etc
  • #7: Pagine = vertici Archi = link in entrata Ogni arco in uscita ha un pesao legato a quello del suo vertice; maggiore la sommatoria dei valori degli archi in ingresso, maggiore il peso del vertice. Algoritmo iterativo
  • #9: Orientato / non orientato
  • #10: Connesso / Non connesso
  • #11: K è la nomeclatura standard x indicare questo tipo di grafi A bipartite graph is useful for e-commerce, when you a all the user nodes that can buy any of the product nodes.
  • #12: Ciclico / Aciclico (o senza cicli)
  • #13: Multi grafo: quando si possono avere più archi che hanno la stessa sorgente e la stessa destinazione Pseudo grafo: quando un arco può avere lo stesso vertice come sorgente e come destinazione
  • #14: Quando dicevo che gli archi sono dappertutto, è soprattuto per questo!
  • #15: Qui si parla di grafi di grosse dimensioni, che non stanno nella RAM di un solo PC.
  • #16: Il grafo rappresentato è un multi-pseduo grafo. ????? rappresentazione interna?
  • #17: A differenza di spark, che offre le API in scala, Java e python, GraphX le offre solo in Scala; tuttavia in un prossimo futuro dovrebbero essere disponibili.
  • #19: Gremlin graph query language (tinkerpop) Gremlin is a DSL for traversing property graphs Neo4j uses (proprietary) cypher as native query language Titan a graph database che supporta come backend di storage: - cassandra (column) - hbase (column) - berkeleyDB (key-value)
  • #21: Immaginiamo di avere un valore per ogni vertice e di voler trovare il valore massimo di tutto il grafo. Con questo modello di computazione, l'idea è che dobbiamo propagare le informazioni fra i nodi. In ogni superstep, ogni vertice che ha ricevuto un valore più alto del suo, lo manda a tutti i suoi vicini. Quando nessun vertice cambia più, l'agoritmo è terminato.
  • #22: Commutativa: 2 + 3 == 3 + 2 Associativa: (2 + 3) + 4 = 2 + (3 + 4)
  • #32: Estrazione JetBrains