SlideShare a Scribd company logo
Social Network Analysis with R ∗
Yanchang Zhao
http://www.RDataMining.com
R and Data Mining Course
Beijing University of Posts and Telecommunications,
Beijing, China
July 2019
∗
Chapter 11: Social Network Analysis, in R and Data Mining: Examples and Case
Studies. http://www.rdatamining.com/docs/RDataMining-book.pdf
1 / 37
Contents
Graph and Social Network Analysis
Graph Construction
Graph Visualization
Graph Query
Centrality Measures
Advanced Graph Visualization
R Packages
Wrap Up
Further Readings and Online Resources
2 / 37
Network and Graph
Nodes, vertices or entities
Edges, links or relationships
Network analysis, graph mining
Link prediction, community/group detection, entity resolution,
recommender system, information propogation modeling
3 / 37
Graph Databases
Neo4j: https://neo4j.com
Giraph on Hadoop: http://giraph.apache.org
GraphX on Spark: http://spark.apache.org/graphx/
4 / 37
Social Network Analysis
Graph construction
Graph query
Centrality measures
Graph visualization
Clustering and community detection
5 / 37
Contents
Graph and Social Network Analysis
Graph Construction
Graph Visualization
Graph Query
Centrality Measures
Advanced Graph Visualization
R Packages
Wrap Up
Further Readings and Online Resources
6 / 37
Graph Construction
Tom, Ben, Bob and Mary are friends of John.
Alice and Wendy are friends of Mary.
Wendy is a friend of David.
library(igraph)
# nodes
nodes <- data.frame(
name = c("Tom","Ben","Bob","John","Mary","Alice","Wendy","David"),
gender = c("M", "M", "M", "M", "F", "F", "F", "M"),
age = c( 16, 30, 42, 29, 26, 32, 18, 22)
)
# relations
edges <- data.frame(
from = c("Tom", "Ben", "Bob", "Mary", "Alice", "Wendy", "Wendy"),
to = c("John", "John", "John", "John","Mary", "Mary", "David")
)
# build a graph object
g <- graph.data.frame(edges, directed=F, vertices=nodes)
7 / 37
Contents
Graph and Social Network Analysis
Graph Construction
Graph Visualization
Graph Query
Centrality Measures
Advanced Graph Visualization
R Packages
Wrap Up
Further Readings and Online Resources
8 / 37
Graph Visualization
layout1 <- g %>% layout_nicely() ## save layout for reuse
g %>% plot(vertex.size = 30, layout = layout1)
Tom
Ben
Bob
John
Mary
AliceWendy
David
9 / 37
Graph Visualization (cont.)
## use blue for male and pink for female
colors <- ifelse(V(g)$gender=="M", "skyblue", "pink")
g %>% plot(vertex.size=30, vertex.color=colors, layout=layout1)
Tom
Ben
Bob
John
Mary
AliceWendy
David
10 / 37
Contents
Graph and Social Network Analysis
Graph Construction
Graph Visualization
Graph Query
Centrality Measures
Advanced Graph Visualization
R Packages
Wrap Up
Further Readings and Online Resources
11 / 37
Graph Query
## nodes
V(g)
## + 8/8 vertices, named, from 8dfec3f:
## [1] Tom Ben Bob John Mary Alice Wendy David
## edges
E(g)
## + 7/7 edges from 8dfec3f (vertex names):
## [1] Tom --John Ben --John Bob --John John --Mary
## [5] Mary --Alice Mary --Wendy Wendy--David
## immediate neighbors (friends) of John
friends <- ego(g,order=1,nodes="John",mindist=1)[[1]] %>% print()
## + 4/8 vertices, named, from 8dfec3f:
## [1] Tom Ben Bob Mary
## female friends of John
friends[friends$gender == "F"]
## + 1/8 vertex, named, from 8dfec3f:
## [1] Mary
12 / 37
Graph Query (cont.)
## 1- and 2-order neighbors (friends) of John
g2 <- make_ego_graph(g, order=2, nodes="John")[[1]]
g2 %>% plot(vertex.size=30, vertex.color=colors)
Tom
Ben
Bob
John
Mary
Alice
Wendy
13 / 37
Contents
Graph and Social Network Analysis
Graph Construction
Graph Visualization
Graph Query
Centrality Measures
Advanced Graph Visualization
R Packages
Wrap Up
Further Readings and Online Resources
14 / 37
Friendship Graph
Tom
Ben
Bob
John
Mary
AliceWendy
David
15 / 37
Centrality Measures
Degree: the number of adjacent edges; indegree and
outdegree for directed graphs
Closeness: the inverse of the average length of the shortest
paths to/from all other nodes
Betweenness: the number of shortest paths going through a
node
degree <- g %>% degree() %>% print()
## Tom Ben Bob John Mary Alice Wendy David
## 1 1 1 4 3 1 2 1
closeness <- g %>% closeness() %>% round(2) %>% print()
## Tom Ben Bob John Mary Alice Wendy David
## 0.06 0.06 0.06 0.09 0.09 0.06 0.07 0.05
betweenness <- g %>% betweenness() %>% print()
## Tom Ben Bob John Mary Alice Wendy David
## 0 0 0 15 14 0 6 0
16 / 37
Centrality Measures (cont.)
Eigenvector centrality: the values of the first eigenvector of
the graph adjacency matrix
Transivity, a.k.a clustering coefficient, measures the
probability that the adjacent nodes of a node are connected.
eigenvector <- evcent(g)$vector %>% round(2) %>% print()
## Tom Ben Bob John Mary Alice Wendy David
## 0.45 0.45 0.45 1.00 0.85 0.38 0.48 0.22
transitivity <- g %>% transitivity(type = "local") %>% print()
## [1] NaN NaN NaN 0 0 NaN 0 NaN
17 / 37
Contents
Graph and Social Network Analysis
Graph Construction
Graph Visualization
Graph Query
Centrality Measures
Advanced Graph Visualization
R Packages
Wrap Up
Further Readings and Online Resources
18 / 37
Static Network Visualization
Static network visualization
Fast in rendering big graphs
For very big graphs, the most efficient way is to save
visualization result into a file, instead of directly to screen.
Save network diagram into files: pdf(), bmp(), jpeg(),
png(), tiff()
library(igraph)
## plot directly to screen when graph is small
plot(g)
## for big graphs, save visualization to a PDF file
pdf("mygraph.pdf")
plot(g)
graphics.off() ## or dev.off()
19 / 37
Interactive Network Visualization
Coordinates of other nodes are not adjusted when moving a
node.
Can be slow when rendering big graphs
Save network diagram into files: visSave(), visExport()
visIgraph(g, idToLabel=T) %>%
## highlight nodes connected to a selected node
visOptions(highlightNearest=T) %>%
## use different icons for different types (groups) of nodes
visGroups(groupname="person", shape="icon",
icon=list(code="f007")) %>%
... %>%
## use FontAwesome icons
addFontAwesome() %>%
## add legend of nodes
visLegend() %>%
## to save to file
visSave(file = "network.html")
20 / 37
Interactive Network Visualization (cont.)
Dynamically adjusting coordinates for better visualization
Very slow when rendering big graphs
x <- toVisNetworkData(g)
visNetwork(nodes=x$nodes, edges=x$edges)%>%
## use different icons for different types (groups) of nodes
visGroups(groupname="person", shape="icon",
icon=list(code="f007")) %>%
... %>%
## use FontAwesome icons
addFontAwesome() %>%
## add legend of nodes
visLegend()
21 / 37
Load Graph Data
## download graph data
url <- "http://www.rdatamining.com/data/graph.rdata"
download.file(url, destfile = "./data/graph.rdata")
library(igraph)
# load graph data into R
# what will be loaded: g, nodes, edges
load("../data/graph.rdata")
22 / 37
Build a Graph
head(nodes, 3)
## name type
## 1 T9 tid
## 2 T24 tid
## 3 T13 tid
head(edges, 3)
## from to
## 1 T9 P27
## 2 T24 P8
## 3 T13 P2
## buid a graph object
g <- graph.data.frame(edges, directed = F, vertices = nodes)
g
## IGRAPH 9597c42 UN-B 61 60 --
## + attr: name (v/c), type (v/c)
## + edges from 9597c42 (vertex names):
## [1] T9 --P27 T24--P8 T13--P2 T27--P10 T29--P29 T2 --P27
## [7] T16--P21 T27--P20 T17--P30 T14--P20 T29--P22 T14--P17
## [13] T21--P18 T18--P9 T4 --P5 T9 --A29 T24--A28 T13--A21 23 / 37
Example of Static Network Visualization
library(igraph)
plot(g, vertex.size=12, vertex.label.cex=0.7,
vertex.color=as.factor(V(g)$type), vertex.frame.color=NA)
T9
T24
T13
T27
T29
T2
T16
T17
T14
T21
T18
T4
P27
P8
P2
P10
P29
P21
P20
P30
P22
P17
P18
P9
P5
A29
A28
A21
A24
A1A15
A23
A7
A10
A5
A13
A12
N5
N7
N14
N8
N26
N2
N24
N4
N17
N23
N27
N12
E20
E3
E12
E9
E25
E14
E24
E23
E19
E22
E1
E15
24 / 37
Example of Interactive Network Visualization
library(visNetwork)
V(g)$group <- V(g)$type
## visualization
data <- toVisNetworkData(g)
visNetwork(nodes=data$nodes, edges=data$edges) %>%
visGroups(groupname="tid",shape="icon",icon=list(code="f15c")) %>%
visGroups(groupname="person",shape="icon",icon=list(code="f007")) %>%
visGroups(groupname="addr",shape="icon",icon=list(code="f015")) %>%
visGroups(groupname="phone",shape="icon",icon=list(code="f095")) %>%
visGroups(groupname="email",shape="icon",icon=list(code="f0e0")) %>%
addFontAwesome() %>%
visLegend()
25 / 37
26 / 37
Contents
Graph and Social Network Analysis
Graph Construction
Graph Visualization
Graph Query
Centrality Measures
Advanced Graph Visualization
R Packages
Wrap Up
Further Readings and Online Resources
27 / 37
R Packages
Network analysis: igraph, sna, statnet
Network visualization: visNetwork
Interface with graph databases: RNeo4j
28 / 37
Package igraph †
V(g), E(g): nodes and edges of graph g
degree, betweenness, closeness, transitivity:
various centrality scores
neighborhood: neighborhood of graph vertices
cliques, largest.cliques, maximal.cliques,
clique.number: find cliques, ie. complete subgraphs
clusters, no.clusters: maximal connected components
of a graph and the number of them
fastgreedy.community, spinglass.community:
community detection
cohesive.blocks: calculate cohesive blocks
induced.subgraph: create a subgraph of a graph (igraph)
read.graph, write.graph: read and writ graphs from and
to files of various formats
†
https://cran.r-project.org/package=igraph
29 / 37
Contents
Graph and Social Network Analysis
Graph Construction
Graph Visualization
Graph Query
Centrality Measures
Advanced Graph Visualization
R Packages
Wrap Up
Further Readings and Online Resources
30 / 37
Wrap Up
Package igraph and sna
Static visualization
Can visualize nodes with shapes, images and icons
Visualise very large graph
Support network analysis and graph mining
Package visNetwork
Interactive visualization
Can visualize nodes with shapes, images and icons
Image rendering can be very slow for large graphs
Designed for visualization only, and does not support network
analysis and graph mining
31 / 37
Contents
Graph and Social Network Analysis
Graph Construction
Graph Visualization
Graph Query
Centrality Measures
Advanced Graph Visualization
R Packages
Wrap Up
Further Readings and Online Resources
32 / 37
Further Readings
Social network analysis (SNA)
https://en.wikipedia.org/wiki/Social_network_analysis
igraph – a network analysis package, supporting R, Python
and C/C++
http://igraph.org
sna – an R package for social network analysis
https://cran.r-project.org/web/packages/sna/index.html
statnet – software tools for the analysis, simulation and
visualization of network data; also available as an R package
http://www.statnet.org
visNetwork – an R package for network visualization
http://datastorm-open.github.io/visNetwork/
33 / 37
Online Resources
Book titled R and Data Mining: Examples and Case
Studies [Zhao, 2012]
http://www.rdatamining.com/docs/RDataMining-book.pdf
R Reference Card for Data Mining
http://www.rdatamining.com/docs/RDataMining-reference-card.pdf
Free online courses and documents
http://www.rdatamining.com/resources/
RDataMining Group on LinkedIn (27,000+ members)
http://group.rdatamining.com
Twitter (3,300+ followers)
@RDataMining
34 / 37
The End
Thanks!
Email: yanchang(at)RDataMining.com
Twitter: @RDataMining
35 / 37
How to Cite This Work
Citation
Yanchang Zhao. R and Data Mining: Examples and Case Studies. ISBN
978-0-12-396963-7, December 2012. Academic Press, Elsevier. 256
pages. URL: http://www.rdatamining.com/docs/RDataMining-book.pdf.
BibTex
@BOOK{Zhao2012R,
title = {R and Data Mining: Examples and Case Studies},
publisher = {Academic Press, Elsevier},
year = {2012},
author = {Yanchang Zhao},
pages = {256},
month = {December},
isbn = {978-0-123-96963-7},
keywords = {R, data mining},
url = {http://www.rdatamining.com/docs/RDataMining-book.pdf}
}
36 / 37
References I
Zhao, Y. (2012).
R and Data Mining: Examples and Case Studies, ISBN 978-0-12-396963-7.
Academic Press, Elsevier.
37 / 37

More Related Content

PDF
RDataMining slides-regression-classification
PDF
RDataMining slides-time-series-analysis
PPTX
R programming language
PDF
RDataMining slides-r-programming
PDF
RDataMining slides-clustering-with-r
PDF
Data Analysis with R (combined slides)
PPTX
R Language Introduction
PPTX
Programming in R
RDataMining slides-regression-classification
RDataMining slides-time-series-analysis
R programming language
RDataMining slides-r-programming
RDataMining slides-clustering-with-r
Data Analysis with R (combined slides)
R Language Introduction
Programming in R

What's hot (20)

PPTX
R language
PDF
Introduction to R Programming
PPTX
R language introduction
PPTX
An Interactive Introduction To R (Programming Language For Statistics)
KEY
Presentation R basic teaching module
PPTX
Language R
PPT
R-programming-training-in-mumbai
PDF
4 R Tutorial DPLYR Apply Function
PDF
R programming & Machine Learning
PPTX
2. R-basics, Vectors, Arrays, Matrices, Factors
PDF
R basics
 
PPT
R studio
PDF
Grouping & Summarizing Data in R
PDF
R Programming: Mathematical Functions In R
PDF
R Programming: Importing Data In R
PDF
2 R Tutorial Programming
PDF
Machine Learning in R
PDF
Next Generation Programming in R
PPTX
Data analysis with R
PPT
Chapter 10 ds
R language
Introduction to R Programming
R language introduction
An Interactive Introduction To R (Programming Language For Statistics)
Presentation R basic teaching module
Language R
R-programming-training-in-mumbai
4 R Tutorial DPLYR Apply Function
R programming & Machine Learning
2. R-basics, Vectors, Arrays, Matrices, Factors
R basics
 
R studio
Grouping & Summarizing Data in R
R Programming: Mathematical Functions In R
R Programming: Importing Data In R
2 R Tutorial Programming
Machine Learning in R
Next Generation Programming in R
Data analysis with R
Chapter 10 ds
Ad

Similar to RDataMining slides-network-analysis-with-r (20)

PPTX
DDGK: Learning Graph Representations for Deep Divergence Graph Kernels
PDF
Social network-analysis-in-python
PDF
GraphX: Graph Analytics in Apache Spark (AMPCamp 5, 2014-11-20)
PDF
High-Performance Graph Analysis and Modeling
PDF
NCCU: Statistics in the Criminal Justice System, R basics and Simulation - Pr...
PDF
Igraph
PDF
Graph computation
PDF
QMC: Undergraduate Workshop, Tutorial on 'R' Software - Yawen Guan, Feb 26, 2...
PDF
Dynamics in graph analysis (PyData Carolinas 2016)
PPTX
K10765 Matlab 3D Mesh Plots
PDF
CMSC 350 FINAL PROJECT
PDF
Time-evolving Graph Processing on Commodity Clusters: Spark Summit East talk ...
PDF
R basics
PPTX
software engineering modules iii & iv.pptx
PDF
Nx tutorial basics
PPTX
Dagstuhl seminar talk on querying big graphs
PPT
Sigmod11 outsource shortest path
PDF
[1D6]RE-view of Android L developer PRE-view
PDF
Graph Analyses with Python and NetworkX
PDF
WT-4065, Superconductor: GPU Web Programming for Big Data Visualization, by ...
DDGK: Learning Graph Representations for Deep Divergence Graph Kernels
Social network-analysis-in-python
GraphX: Graph Analytics in Apache Spark (AMPCamp 5, 2014-11-20)
High-Performance Graph Analysis and Modeling
NCCU: Statistics in the Criminal Justice System, R basics and Simulation - Pr...
Igraph
Graph computation
QMC: Undergraduate Workshop, Tutorial on 'R' Software - Yawen Guan, Feb 26, 2...
Dynamics in graph analysis (PyData Carolinas 2016)
K10765 Matlab 3D Mesh Plots
CMSC 350 FINAL PROJECT
Time-evolving Graph Processing on Commodity Clusters: Spark Summit East talk ...
R basics
software engineering modules iii & iv.pptx
Nx tutorial basics
Dagstuhl seminar talk on querying big graphs
Sigmod11 outsource shortest path
[1D6]RE-view of Android L developer PRE-view
Graph Analyses with Python and NetworkX
WT-4065, Superconductor: GPU Web Programming for Big Data Visualization, by ...
Ad

More from Yanchang Zhao (14)

PDF
RDataMining slides-text-mining-with-r
PDF
RDataMining slides-data-exploration-visualisation
PDF
RDataMining slides-association-rule-mining-with-r
PDF
RDataMining-reference-card
PDF
Text Mining with R -- an Analysis of Twitter Data
PDF
Association Rule Mining with R
PDF
Time Series Analysis and Mining with R
PDF
Regression and Classification with R
PDF
Data Clustering with R
PDF
Data Exploration and Visualization with R
PDF
Introduction to Data Mining with R and Data Import/Export in R
PDF
An Introduction to Data Mining with R
PDF
Time series-mining-slides
PDF
R Reference Card for Data Mining
RDataMining slides-text-mining-with-r
RDataMining slides-data-exploration-visualisation
RDataMining slides-association-rule-mining-with-r
RDataMining-reference-card
Text Mining with R -- an Analysis of Twitter Data
Association Rule Mining with R
Time Series Analysis and Mining with R
Regression and Classification with R
Data Clustering with R
Data Exploration and Visualization with R
Introduction to Data Mining with R and Data Import/Export in R
An Introduction to Data Mining with R
Time series-mining-slides
R Reference Card for Data Mining

Recently uploaded (20)

PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PDF
Encapsulation_ Review paper, used for researhc scholars
PDF
Network Security Unit 5.pdf for BCA BBA.
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PDF
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
PPTX
A Presentation on Artificial Intelligence
PDF
Accuracy of neural networks in brain wave diagnosis of schizophrenia
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PDF
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
PPTX
Group 1 Presentation -Planning and Decision Making .pptx
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
PDF
MIND Revenue Release Quarter 2 2025 Press Release
PDF
A comparative analysis of optical character recognition models for extracting...
PPTX
MYSQL Presentation for SQL database connectivity
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PPTX
SOPHOS-XG Firewall Administrator PPT.pptx
PPTX
Spectroscopy.pptx food analysis technology
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
Reach Out and Touch Someone: Haptics and Empathic Computing
Encapsulation_ Review paper, used for researhc scholars
Network Security Unit 5.pdf for BCA BBA.
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
A Presentation on Artificial Intelligence
Accuracy of neural networks in brain wave diagnosis of schizophrenia
Diabetes mellitus diagnosis method based random forest with bat algorithm
Building Integrated photovoltaic BIPV_UPV.pdf
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
Group 1 Presentation -Planning and Decision Making .pptx
The Rise and Fall of 3GPP – Time for a Sabbatical?
MIND Revenue Release Quarter 2 2025 Press Release
A comparative analysis of optical character recognition models for extracting...
MYSQL Presentation for SQL database connectivity
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
SOPHOS-XG Firewall Administrator PPT.pptx
Spectroscopy.pptx food analysis technology
Agricultural_Statistics_at_a_Glance_2022_0.pdf
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...

RDataMining slides-network-analysis-with-r

  • 1. Social Network Analysis with R ∗ Yanchang Zhao http://www.RDataMining.com R and Data Mining Course Beijing University of Posts and Telecommunications, Beijing, China July 2019 ∗ Chapter 11: Social Network Analysis, in R and Data Mining: Examples and Case Studies. http://www.rdatamining.com/docs/RDataMining-book.pdf 1 / 37
  • 2. Contents Graph and Social Network Analysis Graph Construction Graph Visualization Graph Query Centrality Measures Advanced Graph Visualization R Packages Wrap Up Further Readings and Online Resources 2 / 37
  • 3. Network and Graph Nodes, vertices or entities Edges, links or relationships Network analysis, graph mining Link prediction, community/group detection, entity resolution, recommender system, information propogation modeling 3 / 37
  • 4. Graph Databases Neo4j: https://neo4j.com Giraph on Hadoop: http://giraph.apache.org GraphX on Spark: http://spark.apache.org/graphx/ 4 / 37
  • 5. Social Network Analysis Graph construction Graph query Centrality measures Graph visualization Clustering and community detection 5 / 37
  • 6. Contents Graph and Social Network Analysis Graph Construction Graph Visualization Graph Query Centrality Measures Advanced Graph Visualization R Packages Wrap Up Further Readings and Online Resources 6 / 37
  • 7. Graph Construction Tom, Ben, Bob and Mary are friends of John. Alice and Wendy are friends of Mary. Wendy is a friend of David. library(igraph) # nodes nodes <- data.frame( name = c("Tom","Ben","Bob","John","Mary","Alice","Wendy","David"), gender = c("M", "M", "M", "M", "F", "F", "F", "M"), age = c( 16, 30, 42, 29, 26, 32, 18, 22) ) # relations edges <- data.frame( from = c("Tom", "Ben", "Bob", "Mary", "Alice", "Wendy", "Wendy"), to = c("John", "John", "John", "John","Mary", "Mary", "David") ) # build a graph object g <- graph.data.frame(edges, directed=F, vertices=nodes) 7 / 37
  • 8. Contents Graph and Social Network Analysis Graph Construction Graph Visualization Graph Query Centrality Measures Advanced Graph Visualization R Packages Wrap Up Further Readings and Online Resources 8 / 37
  • 9. Graph Visualization layout1 <- g %>% layout_nicely() ## save layout for reuse g %>% plot(vertex.size = 30, layout = layout1) Tom Ben Bob John Mary AliceWendy David 9 / 37
  • 10. Graph Visualization (cont.) ## use blue for male and pink for female colors <- ifelse(V(g)$gender=="M", "skyblue", "pink") g %>% plot(vertex.size=30, vertex.color=colors, layout=layout1) Tom Ben Bob John Mary AliceWendy David 10 / 37
  • 11. Contents Graph and Social Network Analysis Graph Construction Graph Visualization Graph Query Centrality Measures Advanced Graph Visualization R Packages Wrap Up Further Readings and Online Resources 11 / 37
  • 12. Graph Query ## nodes V(g) ## + 8/8 vertices, named, from 8dfec3f: ## [1] Tom Ben Bob John Mary Alice Wendy David ## edges E(g) ## + 7/7 edges from 8dfec3f (vertex names): ## [1] Tom --John Ben --John Bob --John John --Mary ## [5] Mary --Alice Mary --Wendy Wendy--David ## immediate neighbors (friends) of John friends <- ego(g,order=1,nodes="John",mindist=1)[[1]] %>% print() ## + 4/8 vertices, named, from 8dfec3f: ## [1] Tom Ben Bob Mary ## female friends of John friends[friends$gender == "F"] ## + 1/8 vertex, named, from 8dfec3f: ## [1] Mary 12 / 37
  • 13. Graph Query (cont.) ## 1- and 2-order neighbors (friends) of John g2 <- make_ego_graph(g, order=2, nodes="John")[[1]] g2 %>% plot(vertex.size=30, vertex.color=colors) Tom Ben Bob John Mary Alice Wendy 13 / 37
  • 14. Contents Graph and Social Network Analysis Graph Construction Graph Visualization Graph Query Centrality Measures Advanced Graph Visualization R Packages Wrap Up Further Readings and Online Resources 14 / 37
  • 16. Centrality Measures Degree: the number of adjacent edges; indegree and outdegree for directed graphs Closeness: the inverse of the average length of the shortest paths to/from all other nodes Betweenness: the number of shortest paths going through a node degree <- g %>% degree() %>% print() ## Tom Ben Bob John Mary Alice Wendy David ## 1 1 1 4 3 1 2 1 closeness <- g %>% closeness() %>% round(2) %>% print() ## Tom Ben Bob John Mary Alice Wendy David ## 0.06 0.06 0.06 0.09 0.09 0.06 0.07 0.05 betweenness <- g %>% betweenness() %>% print() ## Tom Ben Bob John Mary Alice Wendy David ## 0 0 0 15 14 0 6 0 16 / 37
  • 17. Centrality Measures (cont.) Eigenvector centrality: the values of the first eigenvector of the graph adjacency matrix Transivity, a.k.a clustering coefficient, measures the probability that the adjacent nodes of a node are connected. eigenvector <- evcent(g)$vector %>% round(2) %>% print() ## Tom Ben Bob John Mary Alice Wendy David ## 0.45 0.45 0.45 1.00 0.85 0.38 0.48 0.22 transitivity <- g %>% transitivity(type = "local") %>% print() ## [1] NaN NaN NaN 0 0 NaN 0 NaN 17 / 37
  • 18. Contents Graph and Social Network Analysis Graph Construction Graph Visualization Graph Query Centrality Measures Advanced Graph Visualization R Packages Wrap Up Further Readings and Online Resources 18 / 37
  • 19. Static Network Visualization Static network visualization Fast in rendering big graphs For very big graphs, the most efficient way is to save visualization result into a file, instead of directly to screen. Save network diagram into files: pdf(), bmp(), jpeg(), png(), tiff() library(igraph) ## plot directly to screen when graph is small plot(g) ## for big graphs, save visualization to a PDF file pdf("mygraph.pdf") plot(g) graphics.off() ## or dev.off() 19 / 37
  • 20. Interactive Network Visualization Coordinates of other nodes are not adjusted when moving a node. Can be slow when rendering big graphs Save network diagram into files: visSave(), visExport() visIgraph(g, idToLabel=T) %>% ## highlight nodes connected to a selected node visOptions(highlightNearest=T) %>% ## use different icons for different types (groups) of nodes visGroups(groupname="person", shape="icon", icon=list(code="f007")) %>% ... %>% ## use FontAwesome icons addFontAwesome() %>% ## add legend of nodes visLegend() %>% ## to save to file visSave(file = "network.html") 20 / 37
  • 21. Interactive Network Visualization (cont.) Dynamically adjusting coordinates for better visualization Very slow when rendering big graphs x <- toVisNetworkData(g) visNetwork(nodes=x$nodes, edges=x$edges)%>% ## use different icons for different types (groups) of nodes visGroups(groupname="person", shape="icon", icon=list(code="f007")) %>% ... %>% ## use FontAwesome icons addFontAwesome() %>% ## add legend of nodes visLegend() 21 / 37
  • 22. Load Graph Data ## download graph data url <- "http://www.rdatamining.com/data/graph.rdata" download.file(url, destfile = "./data/graph.rdata") library(igraph) # load graph data into R # what will be loaded: g, nodes, edges load("../data/graph.rdata") 22 / 37
  • 23. Build a Graph head(nodes, 3) ## name type ## 1 T9 tid ## 2 T24 tid ## 3 T13 tid head(edges, 3) ## from to ## 1 T9 P27 ## 2 T24 P8 ## 3 T13 P2 ## buid a graph object g <- graph.data.frame(edges, directed = F, vertices = nodes) g ## IGRAPH 9597c42 UN-B 61 60 -- ## + attr: name (v/c), type (v/c) ## + edges from 9597c42 (vertex names): ## [1] T9 --P27 T24--P8 T13--P2 T27--P10 T29--P29 T2 --P27 ## [7] T16--P21 T27--P20 T17--P30 T14--P20 T29--P22 T14--P17 ## [13] T21--P18 T18--P9 T4 --P5 T9 --A29 T24--A28 T13--A21 23 / 37
  • 24. Example of Static Network Visualization library(igraph) plot(g, vertex.size=12, vertex.label.cex=0.7, vertex.color=as.factor(V(g)$type), vertex.frame.color=NA) T9 T24 T13 T27 T29 T2 T16 T17 T14 T21 T18 T4 P27 P8 P2 P10 P29 P21 P20 P30 P22 P17 P18 P9 P5 A29 A28 A21 A24 A1A15 A23 A7 A10 A5 A13 A12 N5 N7 N14 N8 N26 N2 N24 N4 N17 N23 N27 N12 E20 E3 E12 E9 E25 E14 E24 E23 E19 E22 E1 E15 24 / 37
  • 25. Example of Interactive Network Visualization library(visNetwork) V(g)$group <- V(g)$type ## visualization data <- toVisNetworkData(g) visNetwork(nodes=data$nodes, edges=data$edges) %>% visGroups(groupname="tid",shape="icon",icon=list(code="f15c")) %>% visGroups(groupname="person",shape="icon",icon=list(code="f007")) %>% visGroups(groupname="addr",shape="icon",icon=list(code="f015")) %>% visGroups(groupname="phone",shape="icon",icon=list(code="f095")) %>% visGroups(groupname="email",shape="icon",icon=list(code="f0e0")) %>% addFontAwesome() %>% visLegend() 25 / 37
  • 27. Contents Graph and Social Network Analysis Graph Construction Graph Visualization Graph Query Centrality Measures Advanced Graph Visualization R Packages Wrap Up Further Readings and Online Resources 27 / 37
  • 28. R Packages Network analysis: igraph, sna, statnet Network visualization: visNetwork Interface with graph databases: RNeo4j 28 / 37
  • 29. Package igraph † V(g), E(g): nodes and edges of graph g degree, betweenness, closeness, transitivity: various centrality scores neighborhood: neighborhood of graph vertices cliques, largest.cliques, maximal.cliques, clique.number: find cliques, ie. complete subgraphs clusters, no.clusters: maximal connected components of a graph and the number of them fastgreedy.community, spinglass.community: community detection cohesive.blocks: calculate cohesive blocks induced.subgraph: create a subgraph of a graph (igraph) read.graph, write.graph: read and writ graphs from and to files of various formats † https://cran.r-project.org/package=igraph 29 / 37
  • 30. Contents Graph and Social Network Analysis Graph Construction Graph Visualization Graph Query Centrality Measures Advanced Graph Visualization R Packages Wrap Up Further Readings and Online Resources 30 / 37
  • 31. Wrap Up Package igraph and sna Static visualization Can visualize nodes with shapes, images and icons Visualise very large graph Support network analysis and graph mining Package visNetwork Interactive visualization Can visualize nodes with shapes, images and icons Image rendering can be very slow for large graphs Designed for visualization only, and does not support network analysis and graph mining 31 / 37
  • 32. Contents Graph and Social Network Analysis Graph Construction Graph Visualization Graph Query Centrality Measures Advanced Graph Visualization R Packages Wrap Up Further Readings and Online Resources 32 / 37
  • 33. Further Readings Social network analysis (SNA) https://en.wikipedia.org/wiki/Social_network_analysis igraph – a network analysis package, supporting R, Python and C/C++ http://igraph.org sna – an R package for social network analysis https://cran.r-project.org/web/packages/sna/index.html statnet – software tools for the analysis, simulation and visualization of network data; also available as an R package http://www.statnet.org visNetwork – an R package for network visualization http://datastorm-open.github.io/visNetwork/ 33 / 37
  • 34. Online Resources Book titled R and Data Mining: Examples and Case Studies [Zhao, 2012] http://www.rdatamining.com/docs/RDataMining-book.pdf R Reference Card for Data Mining http://www.rdatamining.com/docs/RDataMining-reference-card.pdf Free online courses and documents http://www.rdatamining.com/resources/ RDataMining Group on LinkedIn (27,000+ members) http://group.rdatamining.com Twitter (3,300+ followers) @RDataMining 34 / 37
  • 36. How to Cite This Work Citation Yanchang Zhao. R and Data Mining: Examples and Case Studies. ISBN 978-0-12-396963-7, December 2012. Academic Press, Elsevier. 256 pages. URL: http://www.rdatamining.com/docs/RDataMining-book.pdf. BibTex @BOOK{Zhao2012R, title = {R and Data Mining: Examples and Case Studies}, publisher = {Academic Press, Elsevier}, year = {2012}, author = {Yanchang Zhao}, pages = {256}, month = {December}, isbn = {978-0-123-96963-7}, keywords = {R, data mining}, url = {http://www.rdatamining.com/docs/RDataMining-book.pdf} } 36 / 37
  • 37. References I Zhao, Y. (2012). R and Data Mining: Examples and Case Studies, ISBN 978-0-12-396963-7. Academic Press, Elsevier. 37 / 37