Social Network Analysis

AOT LAB
DII, UNIPR

SOCIAL
NETWORK
ANALYSIS
Enrico Franchi (efranchi@ce.unipr.it)

1

Outline

SNA = Complex Network Analysis on Social Networks

Notation & Metrics Degree Distribution
Path Lengths
Transitivity
Models Random Graphs
Small-Worlds
Preferential Attachment

Models Discussion

Conclusion

2

Network Directed Network
G = (V, E) E ⊂ V 2
k out
= ∑ A ij k = ∑ A ji
in

{(x, x) x ∈V } ∩ E = ∅
i i
j j

ki = kiin + kiout

Undirected Network
Adjacency Matrix
A symmetric
⎧1 if (i,j) ∈E
A ij = ⎨
⎩0 otherwise ki = ∑ A ji = ∑ A ij
j j

px = # {i ki = x }
1
Degree Distribution
n
Average Degree k =n −1
∑k x
x∈V
3

Measure of Transitivity

()
−1
ki
Local Clustering Coefficient Ci = 2 T (i)
T(i): # distinct triangles with i as vertex

1
Clustering Coefficient C = ∑ Ci
n i∈V

C=
( number of closed paths of length 2 ) = ( number of triangles ) × 3
( number of paths of length 2 ) ( number of connected triples )

4

Shortest Path Length and Diameter
scalar operations
AB = A + .⋅ B The matrix product depends from

( A,+,⋅) [ AB]ij = ∑ A ik ⋅ Bkj
the operations of the semi-ring

k

Set of Adjacency Matrices
min

Other matrix products make sense: e.g., ( A,+,^ ) or ( A,^,+ )

We consider: (
Sk (M) = M + .^ M k ^ .+ M k )
Shortest path lengths matrix: L = ( Sn … S1 ) ( M )

Diameter: d = max L Average shortest path:  = Lij
ij
5

Computational Complexity of ASPL:

All pairs shortest path matrix based (parallelizable): ( ) α ≈ 3/ 4
O n 3+α

All pairs shortest path Bellman-Ford: O (n )3

All pairs shortest path Dijkstra w. Fibonacci Heaps: O ( n log n + nm )
2

Computing the CPL

x = M q (S) q#S elements are ≤ than x and (1-q)#S are > than x

x = Lqδ (S) q#S(1-δ) elements are ≤ than x and (1-q)#S(1-δ) are > than x

Huber Algorithm

2 2 (1 − δ )
2
Let R a random sample of S such that #R=s, then
s = 2 ln
q  δ 2 Lqδ(S) = Mq(R) with probability p = 1-ε.
6

2 2 (1 − δ )
2

s = 2 ln
q  δ 2

7

Facebook Hugs Degree Distribution

10000000 Nodes: 1322631 Edges: 1555597
m/n: 1.17 CPL: 11.74
1000000
Clustering Coefficient: 0.0527
Number of Components: 18987
100000
Isles: 0

10000
Largest Component Size: 1169456

1000

For large k we have
100
statistical fluctuations

10

1
1 10 100 1000

For small k power-laws do not hold 8

Many networks have
power-law degree distribution. pk ∝ k −γ
γ >1
• Citation networks
k r
=?
• Biological networks

• WWW graph

• Internet graph

• Social Networks

Power-Law: ! gamma=3

1000000

100000

10000

1000

100

10

1

0.1 9
1 10 100 1000

Erdös-Rényi Random Graphs
Connectedness
p Threshold log n / n
G(n, p)
p
G(n, m) p
p
p
p
Ensembles of Graphs p p
When describe values of p
properties, we actually the p Pr(Aij = 1) = p
expected value of the property

d := d = ∑ Pr(G)⋅ d(G) ∝
log n
Pr(G) = p m
(1− p)
() n
2 −m

G log k
⎛ n⎞
m =⎜ ⎟ p k = (n − 1)p C = k (n − 1) −1
⎝ 2⎠
⎛ n − 1⎞ k k
k
pk = ⎜ ⎟ p (1− p)
n−1−k
n→∞ pk = e − k
10
⎝k ⎠ k!

p

Watts-Strogatz Model
In the modified model, we only add the edges.

ki = κ + si ps = e −κ s (κ p ) s
C=
3(κ − 2)
s! 4(κ − 1) + 8κ p + 4κ p 2
Edges in
the lattice # added
pk = e −κ s (κ p ) k−κ
≈
log(npκ )
shortcuts
( k − κ )! κ p
2

11

Strogatz-Watts Model - 10000 nodes k = 4
1
CPL(p)/CPL(0)
C(p)/C(0)
0.8
CPL(p)/CPL(0)

0.6
C(p)/C(0)

0.4

0.2

0
0 0.2 0.4 p 0.6 0.8 1

Short CPL
Large Clustering Coefficient 12
Threshold
Threshold

Barabási-Albert Model Connectedness log n
Threshold log log n

BARABASI-ALBERT-MODEL(G,M0,STEPS) Pr(V = x ) = ∑ Pr(E = e) =
FOR K FROM 1 TO STEPS e∈N ( x )

N0 ← NEW-NODE(G) kx 2k x
= =
ADD-NODE(G,N0) m ∑ kx
A ← MAKE-ARRAY() x
FOR N IN NODES(G)
−3
PUSH(A, N) pk ∝ x
FOR J IN DEGREE(N)
log n
PUSH(A, N) ≈
FOR J FROM 1 TO M log log n
N ← RANDOM-CHOICE(A)
−3/4
ADD-LINK (N0, N) C≈n
Scale-free entails
short CPL
Transitivity disappears 14
with network size No analytical proof available

OSN Refs. Users Links <k> C CP d γ r
L
Club Nexus Adamic et al 2.5 K 10 K 8.2 0.2 4 13 n.a. n.a.
Cyworld Ahn et al 12 M 191 M 31.6 0.2 3.2 16 -0.1
Cyworld T Ahn et al 92 K 0.7 M 15.3 0.3 7.2 n.a. n.a. 0.4
LiveJournal Mislove et al 5 M 77 M 17 0.3 5.9 20 0.2
Flickr Mislove et al 1.8 M 22 M 12.2 0.3 5.7 27 0.2
Twitter Kwak et al 41 M 1700 M n.a. n.a. 4 4.1 n.a.
Orkut Mislove et al 3 M 223 M 106 0.2 4.3 9 1.5 0.1
Orkut Ahn et al 100 K 1.5 M 30.2 0.3 3.8 n.a. 3.7 0.3
Youtube Mislove et al 1.1 M 5 M 4.29 0.1 5.1 21 -0
Facebook Gjoka et al 1 M n.a. n.a. 0.2 n.a. n.a. 0.23
FB H Nazir et al 51 K 116 K n.a. 0.4 n.a. 29 n.a.
FB GL Nazir et al 277 K 600 K n.a. 0.3 n.a. 45 n.a.
BrightKite Scellato et al 54 K 213 K 7.88 0.2 4.7 n.a. n.a.
FourSquare Scellato et al 58 K 351 K 12 0.3 4.6 n.a. n.a.
LiveJournal Scellato et al 993 K 29.6 M 29.9 0.2 4.9 n.a. n.a.
Twitter Java et al 87 K 829 K 18.9 0.1 n.a. 6 0.59
Twitter Scellato et al 409 K 183 M 447 0.2 2.8 n.a. n.a.
15

Static Deg C Rigid

ER Yes Poisson Low -

WS Yes Poisson Ok Yes

BA No PL γ=3 Fixable Yes

• Moreover:

• Mostly no navigability

• Uniformity assumption

• Sometimes too complex for analytic study

• Few features studied

• Power-law?

16

Alternative models for degree distributions
Power-laws are difficult to fit.
When they do, there are often better distributions.

Power-law with cutoff almost always fits better than plain power-law.

f (x;γ , β ) = x −γ eβ x
Sometimes the log-normal distribution is more appropriate

1 ⎛ − ( log(x / m))2 ⎞
f (x;σ , m) = exp ⎜ ⎟
xσ (2π )1/2
⎝ 2σ 2
⎠

Most of the times random and preferential attachment processes concur

F(x;r) = 1− (rm)1+r (x + rm)−(1+r )
r→0 r→∞
17
scale-free negative exponential dist.

Massachussets 1st run: 64/296 arrived, most
Boston delivered to him by 2 men
Nebraska
2nd run: 24/160 arrived, 2/3
delivered by “Mr. Jacobs”
Omaha
2 ≤ hops ≤ 10; µ=5.x
Wichita 6 Degrees
CPL, hubs, ...

Kansas ... and Kleinberg’s Intuition

Milgram’s Experiment
• Random people from Omaha & Wichita were asked to
send a postcard to a person in Boston:

• Write the name on the postcard

• Forward the message only to people personally known
18
that was more likely to know the target

Biased Preferential Attachment
At each step:

A new node is added to the network and is assigned to one of the
sets P, I and L according to a probability distribution h
+
e0 ∈ edges are added to the network

for each edge (u,v) u is chosen with distribution D0 and:

if u ∈ I, v is a new node and is assigned to P;

if u ∈ L, v is chosen according to Dγ.

⎧(β + 1)(ku + 1) u ∈L
β ⎪
D (u) ∝ ⎨ ku + 1 u ∈I
⎪0 u ∈P
⎩

No analytic results available.
19

Transitive Linking Model [Davidsen 02]
Transitive Linking
I At each step:
TL: a random node is chosen, and it introduces two other nodes that
are linked to it; if the node does not have 2 edges, it introduces
himself to a random node
RM: with probability p a node is chosen and removed along its edges
and replaced with a node with one random edge
I When p ⇤ 1 the TL dominates the process:
I the degree distribution is a power-law with cutoff
I 1 C = p(⌅k ⇧ 1), i.e., quite large in practice
I For larger values of p the two different process concur to form an
exponential degree distribution
I for p ⇥ 1 the degree distribution is essentially a Poisson
distribution

Instead of p it would make sense to have distinct p and r
Bergenti, Franchi, Poggi (Univ. Parma) Models for Agent-based Simulation of SN SNAMAS ’11 11 / 19
parameters for nodes leaving and entering the network

Few analytic results available.
20

[1]

Dorogovtsev, S. N. and Mendes, J. F. F. 2003 Evolution of Networks: From Biological Nets
to the Internet and WWW (Physics). Oxford University Press, USA.
[2]

Watts, D. J. 2003 Small Worlds: The Dynamics of Networks between Order and
Randomness (Princeton Studies in Complexity). Princeton University Press.
[3]

Jackson, M. O. 2010 Social and Economic Networks. Princeton University Press.
[4]

Newman, M. 2010 Networks: An Introduction. Oxford University Press, USA.
[5]

Wasserman, S. and Faust, K. 1994 Social Network Analysis: Methods and Applications
(Structural Analysis in the Social Sciences). Cambridge University Press.
[6]

Scott, J. P. 2000 Social Network Analysis: A Handbook. Sage Publications Ltd.
[7]

Kepner, J. and Gilbert, J. 2011 Graph Algorithms in the Language of Linear Algebra
(Software, Environments, and Tools). Society for Industrial & Applied Mathematics.
[8]

Cormen, T. H., Leiserson, C. E., Rivest, R. L., and Stein, C. 2009 Introduction to
Algorithms. The MIT Press.
[9]

Skiena, S. S. 2010 The Algorithm Design Manual. Springer.
[10]

Bollobas, B. 1998 Modern Graph Theory. Springer.
[11]

Watts, D. J. and Strogatz, S. H. 1998. Collective dynamics of ‘small-world’networks.
Nature. 393, 6684, 440-442.
[12]

Barabási, A. L. and Albert, R. 1999. Emergence of scaling in random networks. Science.
286, 5439, 509.
[13]

Kleinberg, J. 2000. The small-world phenomenon: an algorithm perspective. Proceedings of
the thirty-second annual ACM symposium on Theory of computing. 163-170.
[14]

Milgram, S. 1967. The small world problem. Psychology today. 2, 1, 60-67.

21

Thanks for your kind attention.

Enrico Franchi (efranchi@ce.unipr.it)
AOTLAB, Dipartimento Ingegneria dell’Informazione,
Università di Parma

22

Social Network Analysis

More Related Content

What's hot (20)

Similar to Social Network Analysis (20)

More from rik0 (13)

Recently uploaded (20)

Social Network Analysis