SlideShare a Scribd company logo
Using Local Spectral
Methods to Robustify
Graph-Based Learning
David F. Gleich!
Purdue University!
Joint work with 
Michael
Mahoney @
Berkeley
supported by "
NSF CAREER
CCF-1149756
Code www.cs.purdue.edu/homes/dgleich/codes/robust-diffusions!
KDD2015
The graph-based data analysis pipeline
1 0 0 0 1 0 0 1
0 1 0 1 0 0 1 1
0 1 0 1 0 0 0 1
1 0 0 0 0 0 1 1
1 1 0 1 1 1 0 1
1 0 1 1 0 0 0 1
1 0 1 1 1 0 1 0
1 1 1 1 0 1 0 0
1 1 1 0 0 1 1 1
1 1 0 1 1 1 1 1
"
Raw data!
•  Relationships
•  Images
•  Text records
•  Etc.
"
Convert to a graph!
•  Nearest neighs
•  Kernels
•  2-mode to 1-mode
•  Etc.
"
Algorithm/Learning!
•  Important nodes
•  Infer features
•  Clustering
•  Etc.
KDD2015
David Gleich ¡ Purdue
2
“Noise” in the initial data
modeling decisions
"
Explicit graphs!
are those that are
given to a data
analyst. 

“A social network”
•  Known spam
accounts included?
•  Users not logged in
for a year?
•  Etc.
A type of noise
"
Constructed graphs!
are built based on some
other primary data."


“nearest neighbor graphs”
•  K-NN or ε-NN
•  Thresholding correlations
to zero

Often made for computational
convenience! (Graph too big.)
A different type a noise!
"
Labeled graphs!
occur in information
diffusion/propagation 


“function prediction”
•  Labeled nodes
•  Labeled edges
•  Some are wrong


A direct type of noise!
Do these decisions matter? 
Our experience Yes! Dramatically so!
KDD2015
David Gleich ¡ Purdue
3
The graph-based data analysis pipeline
1 0 0 0 1 0 0 1
0 1 0 1 0 0 1 1
0 1 0 1 0 0 0 1
1 0 0 0 0 0 1 1
1 1 0 1 1 1 0 1
1 0 1 1 0 0 0 1
1 0 1 1 1 0 1 0
1 1 1 1 0 1 0 0
1 1 1 0 0 1 1 1
1 1 0 1 1 1 1 1
"
Raw data!
•  Relationships
•  Images
•  Text records
•  Etc.
"
Convert to a graph!
•  Nearest neighs
•  Kernels
•  2-mode to 1-mode
•  Etc.
"
Algorithm/Learning!
•  Important nodes
•  Infer features
•  Clustering
•  Etc.
Most algorithmic and
statistical research
happens here
The potential
downstream signal is
determined by this step
KDD2015
David Gleich ¡ Purdue
4
Our goal: towards an integrative analysis
1 0 0 0 1 0 0 1
0 1 0 1 0 0 1 1
0 1 0 1 0 0 0 1
1 0 0 0 0 0 1 1
1 1 0 1 1 1 0 1
1 0 1 1 0 0 0 1
1 0 1 1 1 0 1 0
1 1 1 1 0 1 0 0
1 1 1 0 0 1 1 1
1 1 0 1 1 1 1 1
!
•  How does the graph creation process affect
the outcomes of graph-based learning?
•  Is there anything we can do to make this
process more robust?
KDD2015
David Gleich ¡ Purdue
5
Graph-based learning is usually only
one component of a big pipeline
KDD2015
David Gleich ¡ Purdue
6
1 0 0 0 1 0 0 1
0 1 0 1 0 0 1 1
0 1 0 1 0 0 0 1
1 0 0 0 0 0 1 1
1 1 0 1 1 1 0 1
1 0 1 1 0 0 0 1
1 0 1 1 1 0 1 0
1 1 1 1 0 1 0 0
1 1 1 0 0 1 1 1
1 1 0 1 1 1 1 1
1 0 0 0 1 0 0 1
0 1 0 1 0 0 1 1
0 1 0 1 0 0 0 1
1 0 0 0 0 0 1 1
1 1 0 1 1 1 0 1
1 0 1 1 0 0 0 1
1 0 1 1 1 0 1 0
1 1 1 1 0 1 0 0
1 1 1 0 0 1 1 1
1 1 0 1 1 1 1 1
1 0 0 0 1 0 0 1
0 1 0 1 0 0 1 1
0 1 0 1 0 0 0 1
1 0 0 0 0 0 1 1
1 1 0 1 1 1 0 1
1 0 1 1 0 0 0 1
1 0 1 1 1 0 1 0
1 1 1 1 0 1 0 0
1 1 1 0 0 1 1 1
1 1 0 1 1 1 1 1
1 0 0 0 1 0 0 1
0 1 0 1 0 0 1 1
0 1 0 1 0 0 0 1
1 0 0 0 0 0 1 1
1 1 0 1 1 1 0 1
1 0 1 1 0 0 0 1
1 0 1 1 1 0 1 0
1 1 1 1 0 1 0 0
1 1 1 0 0 1 1 1
1 1 0 1 1 1 1 1
1 0 0 0 1 0 0 1
0 1 0 1 0 0 1 1
0 1 0 1 0 0 0 1
1 0 0 0 0 0 1 1
1 1 0 1 1 1 0 1
1 0 1 1 0 0 0 1
1 0 1 1 1 0 1 0
1 1 1 1 0 1 0 0
1 1 1 0 0 1 1 1
1 1 0 1 1 1 1 1
1 0 0 0 1 0 0 1
0 1 0 1 0 0 1 1
0 1 0 1 0 0 0 1
1 0 0 0 0 0 1 1
1 1 0 1 1 1 0 1
1 0 1 1 0 0 0 1
1 0 1 1 1 0 1 0
1 1 1 1 0 1 0 0
1 1 1 0 0 1 1 1
1 1 0 1 1 1 1 1
1 0 0 0 1 0 0 1
0 1 0 1 0 0 1 1
0 1 0 1 0 0 0 1
1 0 0 0 0 0 1 1
1 1 0 1 1 1 0 1
1 0 1 1 0 0 0 1
1 0 1 1 1 0 1 0
1 1 1 1 0 1 0 0
1 1 1 0 0 1 1 1
1 1 0 1 1 1 1 1
1 0 0 0 1 0 0 1
0 1 0 1 0 0 1 1
0 1 0 1 0 0 0 1
1 0 0 0 0 0 1 1
1 1 0 1 1 1 0 1
1 0 1 1 0 0 0 1
1 0 1 1 1 0 1 0
1 1 1 1 0 1 0 0
1 1 1 0 0 1 1 1
1 1 0 1 1 1 1 1
Many databases over"
genes with survival rates"
for various cancers
List of possible genes"
responsible for survival
Cluster analysis
Reinteration of data
THIS STEP SHOULD!
BE ROBUST TO !
VARIATIONS ABOVE!
Scalable graph analytics
Local methods are one of the most successful
classes of scalable graph analytics
They don’t even look at the entire graph.
•  Andersen Chung Lang (ACL) Push method
Conjecture"
Local methods regularize some variant of the
original algorithm or problem.
Justication"
For ACL and a few relatives this is exact!
Impact?!
Improved robustness to noise?
KDD2015
David Gleich ¡ Purdue
7
c
For instance, to "
answer “what function” is
shared by the started node,
we’d only look at the circled
region.
Our contributions
We study these issues in the case of "
semi-supervised learning (SSL) on graphs
1.  We illustrate a common mincut framework for a variety of SSL
methods
2.  Show how to “localize” one (and make it scalable!)
3.  Provide a more robust SSL labeling method
4.  Identify a weakness in SSL methods: they cannot use extra
edges! We nd one useful way to do so.
KDD2015
David Gleich ¡ Purdue
8
Semi-supervised "
graph-based learning 
KDD2015
David Gleich ¡ Purdue
9
Given a graph, and a few labeled nodes,
predict the labels on the rest of the graph.
Algorithm

1.  Run a diffusion for
each label (possibly
with neg. info from
other classes)
2.  Assign new labels
based on the value of
each diffusion
Semi-supervised "
graph-based learning 
KDD2015
David Gleich ¡ Purdue
10
Given a graph, and a few labeled nodes,
predict the labels on the rest of the graph.
Algorithm

1.  Run a diffusion for
each label (possibly
with neg. info from
other classes)
2.  Assign new labels
based on the value of
each diffusion
Semi-supervised "
graph-based learning 
KDD2015
David Gleich ¡ Purdue
11
Given a graph, and a few labeled nodes,
predict the labels on the rest of the graph.
Algorithm

1.  Run a diffusion for
each label (possibly
with neg. info from
other classes)
2.  Assign new labels
based on the value of
each diffusion
The diffusions proposed for semi-
supervised learning are s,t-cut minorants
KDD2015
David Gleich ¡ Purdue
12
1
3
2
6
4
5
7
8
9
10
t
s
In the unweighted case, "
solve via max-flow.

In the weighted case,
solve via network simplex
or industrial LP.
minimize
qP
ij2E Ci,j |xi xj |2
subject to xs = 1, xt = 0.
minimize
P
ij2E Ci,j |xi xj |
subject to xs = 1, xt = 0.
MINCUT LP
 Spectral minorant – lin. sys.
Representative cut problems
KDD2015
David Gleich ¡ Purdue
13
∞
∞
∞
∞
s
t
ZGL
Îą
Îą 4Îą
3Îą
4Îą
6Îą
3Îą
3Îą
5Îą
5Îą
5Îą 2Îą
5Îą
4Îą
5Îą
s
t
Zhou et al.
Positive label
Neg. label
Unlabeled
Andersen-Lang weighting "
variation too
Joachims has a variation too.
Zhou et al. NIPS 2003; Zhu et al., ICML 2003;
Andersen Lang, SODA 2008; Joachims, ICML 2003
These help our intuition about the solutions
All spectral minorants are linear systems.
Implicit regularization views
on the Zhou et al. diffusion
KDD2015
David Gleich ¡ Purdue
14
Îą
Îą 4Îą
3Îą
4Îą
6Îą
3Îą
3Îą
5Îą
5Îą
5Îą 2Îą
5Îą
4Îą
5Îą
s
t
Zhou et al.
RESULT!
The spectral minorant of Zhou is equivalent to
the weakly-local MOV solution.
PROOF!
The two linear systems are the same (after
working out a few equivalences).
IMPORTANCE!
We’d expect Zhou to be “more robust” 
minimize
qP
ij2E Ci,j |xi xj |2
subject to xs = 1, xt = 0.
The Mahoney-Orecchia-Vishnoi (MOV) vector is a
localized variation on the Fiedler vector to nd a small
conductance set nearby a seed set.
A scalable, localized algorithm for Zhou
et al’s diffusion. 
KDD2015
David Gleich ¡ Purdue
15
RESULT!
We can use a variation on coordinate descent methods related to
the Andersen-Chung-Lang PUSH procedure to solve Zhou’s
diffusion in a scalable manner. 
PROOF. See Gleich-Mahoney ICML ‘14
IMPORTANCE (1)!
We should be able to make Zhou et al. scale.
IMPORTANCE (2)!
Using this algorithm adds another implicit regularization term that
should further improve robustness!
minimize
qP
ij2E Ci,j |xi xj |2
subject to xs = 1, xt = 0.
minimize
P
ij2E Ci,j |xi xj |2
+ ⌧
P
i2V di xi
subject to xs = 1, xt = 0, xi 0.
Semi-supervised "
graph-based learning 
KDD2015
David Gleich ¡ Purdue
16
Given a graph, and a few labeled nodes,
predict the labels on the rest of the graph.
Algorithm

1.  Run a diffusion for
each label (possibly
with neg. info from
other classes)
2.  Assign new labels
based on the value of
each diffusion
Traditional rounding methods
for SSL are value-based
KDD2015
David Gleich ¡ Purdue
17
Class 1 Class 2Class 3
Class 1
Class 2
Class 3
CLASS 1
CLASS 2
CLASS 3
VALUE-BASED
Use the largest value of the diffusion to pick the label.
Zhou’s diffusion
But value based rounding
doesn’t work for all diffusions
KDD2015
David Gleich ¡ Purdue
18
Class 1 Class 2Class 3
Class 1
Class 2
Class 3
Class 1 Class 2Class 3 Class 1 Class 2Class 3 Class 1 Class 2Class 3
(b) Zhou et al., l = 3 (c) Andersen-Lang, l = 3 (d) Joachims, l = 3 (e) ZGL, l = 3
Class 1 Class 2Class 3 Class 1 Class 2Class 3 Class 1 Class 2Class 3 Class 1 Class 2Class 3
(f) Zhou et al., l = 15 (g) Andersen-Lang, l = 15 (h) Joachims, l = 15 (i) ZGL, l = 15
CLASS 1
CLASS 3
CLASS 2
VALUE-BASED rounding fails
for most of these diffusions
BUT
 There is still a
signal there!
Adding more labels doesn’t help either, see the
paper for those details
Rank-based rounding is far
more robust. 
KDD2015
David Gleich ¡ Purdue
19
Class 1 Class 2Class 3
NEW IDEA!
Look at the RANK of the
item in each diffusion
instead of it’s VALUE.

JUSTIFICATION!
Based on the idea of
sweep-cut rounding in
spectral methods (use the
order induced by the
eigenvector, not its values)

IMPACT!
Much more robust
rounding to labels
Rank-based rounding has a
big impact on a real-study.
KDD2015
David Gleich ¡ Purdue
20
2 4 6 8 10
0
0.2
0.4
0.6
0.8
errorrate
average training samples per class
Zhou
Zhou+Push
2 4 6 8 10
0
0.2
0.4
0.6
0.8
1
errorrate
average training samples per class
Zhou
Zhou+Push
We used the digit prediction task out of Zhou’s paper and added just a bit of noise"
as label errors and switched parameters.
VALUE-BASED
 RANK-BASED
Main empirical results
1.  Zhou’s diffusion seems to work best for sparse
graphs whereas the ZGL diffusion works best for
dense
2.  On the digit’s dataset, dense graph constructions
yield higher error rates
3.  Densifying the a super-sparse graph construction on
the digits dataset yields lower error.
4.  And a similar fact holds on an Amazon co-
purchasing network. 
KDD2015
David Gleich ¡ Purdue
21
5 10
0
50
100
150
number of labels
numberofmistakes
An illustrative synthetic problem
shows the differences.
Two-class block-model, 150 nodes each"
between prob = 0.02, "
withinprob = 0.35 (dense) or 0.06 (sparse)
Reveal labels for k nodes (varied) and we have
different error rates (sparse 0/10% low/high)
and dense (20%/60% for low-high)
KDD2015
David Gleich ¡ Purdue
22
5 10
0
50
100
150
number of labels
numberofmistakes
Joachims
Zhou
ZGL
Real-world scenario"
sparse graph, high error
Sparse graph,"
low error
20 40 60
0
20
40
60
80
100
number of labels
numberofmistakes
20 40 60
0
20
40
60
80
100
number of labels
numberofmistakes
Dense graph,"
low error rate
Dense graph,"
high error rate
Varying density in an SSL
construction.
KDD2015
David Gleich ¡ Purdue
23
Ai,j = exp
✓
kdi dj k2
2
2 2
◆
di
dj = 2.5
= 1.25
We use the digits
experiment from
Zhou et al. 2003.
10 digits and a
few label errors. 

We vary density
either by the
num. of nearest
neighbors or by
the kernel width.
As density increases, the
results just get worse
KDD2015
David Gleich ¡ Purdue
24
1 1.5 2 2.5
0
0.1
0.2
0.3
0.4
errorrate
σ
0.8
1.2
1.5
1.8
2.1
2.5
Zhou
Zhou+Push
10
2
0
0.1
0.2
0.3
0.4
errorrate
nearest neighbors
5
10
25
50
100
150
200
250
Zhou
Zhou+Push
Varying kernel width
 Varying nearest neighors
•  Adding “more” edges seems to only hurt. (Unless there is no signal). 
•  Zhou+Push seems to be slightly more robust (Maybe).
Some observations and a
question.
Adding “more data” yields “worse results” for
this procedure (in a simple setting).

Suppose I have a real-world system that can
work with up to E edges on some graph. 
Is there a way I can create new edges?
KDD2015
David Gleich ¡ Purdue
25
Densifying the graph with
path expansions
KDD2015
David Gleich ¡ Purdue
26
Ak =
kX
`=1
A` If A is the adjacency matrix, then this counts the
total weight on all paths up to length k. 
We now repeat the nearest neighbor computation, but with paired
parameters such that we have the same average degree. 
Zhou Zhou w. Push
Avg. Deg k = 1 k 1 k = 1 k 1
19 0.163 0.114 0.156 0.117
41 0.156 0.132 0.158 0.113
53 0.183 0.142 0.179 0.136
104 0.193 0.145 0.178 0.144
138 0.216 0.102 0.204 0.101
k=4, nn = 3
The same result holds for Amazon’s
co-purchasing network
KDD2015
David Gleich ¡ Purdue
27
mean F1 Condence intervals
k Zhou Zhou
w. Push
Zhou Zhou w. Push
1 0.173 0.229 [0.15 0.19] [0.21 0.25]
2 0.197 0.231 [0.18 0.22] [0.21 0.25]
3 0.221 0.238 [0.17 0.27] [0.19 0.28]
Amazon’s co-purchasing network (on Snap) is effectively
a highly sparse nearest-neighbor network from their
(denser) co-purchasing graph. 

We attempt to predict the items in a product category
based on a small sample and study the F1 score for the
predictions. 
Some small details missing – see the full paper.
Ak =
kX
`=1
A`
(a) K2 sparse (b) K2 dense (c) RK2
Figure 2: We articially densify this graph to Ak based on a cons
and dense di↵usions and regularization. The color indicates the
circled nodes. The unavoidable errors are caused by a mislabeled
regularizing di↵usions on dense graphs produces only a small
Towards some theory, i.e. why are
densied sparse graphs better?
KDD2015
David Gleich ¡ Purdue
28
How do sparsity, density,
and regularization of a
diffusion play into the results
in a controlled setting?
THE ERROR
Labels
(a) K2 sparse (b) K2 dense (c) RK2
Figure 2: We articially densify this graph to Ak based on a cons
and dense di↵usions and regularization. The color indicates the
circled nodes. The unavoidable errors are caused by a mislabeled
regularizing di↵usions on dense graphs produces only a small
Towards some theory, i.e. why are
densied sparse graphs better?
KDD2015
David Gleich ¡ Purdue
29
THE ERROR
Labels
Using Push algorithm
P5
`=1 A`
P5
`=1 A`
Regularization
Dense
Dense
Recap, discussion, future work
Contributions
1.  Flow-setup for SSL
diffusions
2.  New robust rounding rule for
class selection
3.  Localized Zhou’s diffusion
4.  Empirical insights on density
of graph constructions
Observations
•  Many of these insights
translate to directed,
weighted graphs with fuzzy
labels and/or some parallel
architectures.
•  Weakness mainly empirical
results on the density.
•  We need a theoretical basis
for the densication theory!
KDD2015
David Gleich ¡ Purdue
30
Supported by NSF, ARO, DARPA
 CODE www.cs.purdue.edu/homes/dgleich/codes/robust-diffusions

More Related Content

PDF
Personalized PageRank based community detection
PDF
PageRank Centrality of dynamic graph structures
PDF
Localized methods for diffusions in large graphs
PDF
Localized methods in graph mining
PDF
Spacey random walks and higher order Markov chains
PDF
Engineering Data Science Objectives for Social Network Analysis
PDF
Anti-differentiating Approximation Algorithms: PageRank and MinCut
PDF
Non-exhaustive, Overlapping K-means
Personalized PageRank based community detection
PageRank Centrality of dynamic graph structures
Localized methods for diffusions in large graphs
Localized methods in graph mining
Spacey random walks and higher order Markov chains
Engineering Data Science Objectives for Social Network Analysis
Anti-differentiating Approximation Algorithms: PageRank and MinCut
Non-exhaustive, Overlapping K-means

What's hot (20)

PDF
Anti-differentiating approximation algorithms: A case study with min-cuts, sp...
PDF
Big data matrix factorizations and Overlapping community detection in graphs
PDF
Higher-order organization of complex networks
PDF
Spacey random walks and higher-order data analysis
PDF
Spectral clustering with motifs and higher-order structures
PDF
Iterative methods with special structures
PDF
A dynamical system for PageRank with time-dependent teleportation
PDF
Correlation clustering and community detection in graphs and networks
PDF
What is Critical in GAN Training?
PDF
Lecture 5 backpropagation
PDF
PR 103: t-SNE
PDF
Numerical Approximation of Filtration Processes through Porous Media
PDF
Gaps between the theory and practice of large-scale matrix-based network comp...
PDF
Backpropagation: Understanding How to Update ANNs Weights Step-by-Step
PPTX
Contactless Calipper - English
PDF
PR 113: The Perception Distortion Tradeoff
PDF
B0560508
PDF
Lesson 26: Integration by Substitution (handout)
PDF
Cs36565569
PDF
Multimodal Residual Learning for Visual Question-Answering
Anti-differentiating approximation algorithms: A case study with min-cuts, sp...
Big data matrix factorizations and Overlapping community detection in graphs
Higher-order organization of complex networks
Spacey random walks and higher-order data analysis
Spectral clustering with motifs and higher-order structures
Iterative methods with special structures
A dynamical system for PageRank with time-dependent teleportation
Correlation clustering and community detection in graphs and networks
What is Critical in GAN Training?
Lecture 5 backpropagation
PR 103: t-SNE
Numerical Approximation of Filtration Processes through Porous Media
Gaps between the theory and practice of large-scale matrix-based network comp...
Backpropagation: Understanding How to Update ANNs Weights Step-by-Step
Contactless Calipper - English
PR 113: The Perception Distortion Tradeoff
B0560508
Lesson 26: Integration by Substitution (handout)
Cs36565569
Multimodal Residual Learning for Visual Question-Answering
Ad

Viewers also liked (18)

PDF
How does Google Google: A journey into the wondrous mathematics behind your f...
PDF
Sparse matrix computations in MapReduce
PDF
Graph libraries in Matlab: MatlabBGL and gaimc
PDF
Overlapping clusters for distributed computation
PDF
Fast matrix primitives for ranking, link-prediction and more
PPTX
Graph based Semi Supervised Learning V1
PDF
Direct tall-and-skinny QR factorizations in MapReduce architectures
PDF
Iterative methods for network alignment
PDF
What you can do with a tall-and-skinny QR factorization in Hadoop: Principal ...
PDF
A history of PageRank from the numerical computing perspective
PDF
A multithreaded method for network alignment
PDF
The power and Arnoldi methods in an algebra of circulants
PDF
Tall and Skinny QRs in MapReduce
PDF
MapReduce Tall-and-skinny QR and applications
PDF
Relaxation methods for the matrix exponential on large networks
PDF
Tall-and-skinny QR factorizations in MapReduce architectures
PDF
Fast relaxation methods for the matrix exponential
PDF
Vertex neighborhoods, low conductance cuts, and good seeds for local communit...
How does Google Google: A journey into the wondrous mathematics behind your f...
Sparse matrix computations in MapReduce
Graph libraries in Matlab: MatlabBGL and gaimc
Overlapping clusters for distributed computation
Fast matrix primitives for ranking, link-prediction and more
Graph based Semi Supervised Learning V1
Direct tall-and-skinny QR factorizations in MapReduce architectures
Iterative methods for network alignment
What you can do with a tall-and-skinny QR factorization in Hadoop: Principal ...
A history of PageRank from the numerical computing perspective
A multithreaded method for network alignment
The power and Arnoldi methods in an algebra of circulants
Tall and Skinny QRs in MapReduce
MapReduce Tall-and-skinny QR and applications
Relaxation methods for the matrix exponential on large networks
Tall-and-skinny QR factorizations in MapReduce architectures
Fast relaxation methods for the matrix exponential
Vertex neighborhoods, low conductance cuts, and good seeds for local communit...
Ad

Similar to Using Local Spectral Methods to Robustify Graph-Based Learning (20)

PPTX
Spectral clustering Tutorial
PDF
GraphSignalProcessingFinalPaper
PPTX
Unit ii-ppt
PPTX
Optimization algorithms for solving computer vision problems
PDF
An Introduction to Spectral Graph Theory
PDF
Machine Learning With MapReduce, K-Means, MLE
PPT
. An introduction to machine learning and probabilistic ...
 
PPTX
community detection
PDF
Graph Signal Processing for Machine Learning A Review and New Perspectives - ...
PDF
Statistical inference of generative network models - Tiago P. Peixoto
PDF
IJCAI13 Paper review: Large-scale spectral clustering on graphs
PDF
Spectral Clustering Report
PDF
Graph Analysis Beyond Linear Algebra
PDF
Graph Signal Processing for Machine Learning A Review and New Perspectives - ...
PPT
Cristopher M. Bishop's tutorial on graphical models
 
PPT
Cristopher M. Bishop's tutorial on graphical models
 
PPT
Cristopher M. Bishop's tutorial on graphical models
 
PPT
Cristopher M. Bishop's tutorial on graphical models
 
PPT
Cristopher M. Bishop's tutorial on graphical models
 
PPTX
Data Mining Lecture_13.pptx
Spectral clustering Tutorial
GraphSignalProcessingFinalPaper
Unit ii-ppt
Optimization algorithms for solving computer vision problems
An Introduction to Spectral Graph Theory
Machine Learning With MapReduce, K-Means, MLE
. An introduction to machine learning and probabilistic ...
 
community detection
Graph Signal Processing for Machine Learning A Review and New Perspectives - ...
Statistical inference of generative network models - Tiago P. Peixoto
IJCAI13 Paper review: Large-scale spectral clustering on graphs
Spectral Clustering Report
Graph Analysis Beyond Linear Algebra
Graph Signal Processing for Machine Learning A Review and New Perspectives - ...
Cristopher M. Bishop's tutorial on graphical models
 
Cristopher M. Bishop's tutorial on graphical models
 
Cristopher M. Bishop's tutorial on graphical models
 
Cristopher M. Bishop's tutorial on graphical models
 
Cristopher M. Bishop's tutorial on graphical models
 
Data Mining Lecture_13.pptx

Recently uploaded (20)

PPTX
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
PPTX
breach-and-attack-simulation-cybersecurity-india-chennai-defenderrabbit-2025....
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PDF
Chapter 3 Spatial Domain Image Processing.pdf
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PDF
Modernizing your data center with Dell and AMD
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PDF
KodekX | Application Modernization Development
 
PDF
NewMind AI Weekly Chronicles - August'25 Week I
PPTX
Cloud computing and distributed systems.
PDF
cuic standard and advanced reporting.pdf
PPTX
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
PPT
Teaching material agriculture food technology
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
 
PDF
Advanced IT Governance
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
 
PDF
madgavkar20181017ppt McKinsey Presentation.pdf
PPTX
Big Data Technologies - Introduction.pptx
PDF
Bridging biosciences and deep learning for revolutionary discoveries: a compr...
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
breach-and-attack-simulation-cybersecurity-india-chennai-defenderrabbit-2025....
Per capita expenditure prediction using model stacking based on satellite ima...
Dropbox Q2 2025 Financial Results & Investor Presentation
Chapter 3 Spatial Domain Image Processing.pdf
20250228 LYD VKU AI Blended-Learning.pptx
Modernizing your data center with Dell and AMD
Reach Out and Touch Someone: Haptics and Empathic Computing
KodekX | Application Modernization Development
 
NewMind AI Weekly Chronicles - August'25 Week I
Cloud computing and distributed systems.
cuic standard and advanced reporting.pdf
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
Teaching material agriculture food technology
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
 
Advanced IT Governance
The Rise and Fall of 3GPP – Time for a Sabbatical?
 
madgavkar20181017ppt McKinsey Presentation.pdf
Big Data Technologies - Introduction.pptx
Bridging biosciences and deep learning for revolutionary discoveries: a compr...

Using Local Spectral Methods to Robustify Graph-Based Learning

  • 1. Using Local Spectral Methods to Robustify Graph-Based Learning David F. Gleich! Purdue University! Joint work with Michael Mahoney @ Berkeley supported by " NSF CAREER CCF-1149756 Code www.cs.purdue.edu/homes/dgleich/codes/robust-diffusions! KDD2015
  • 2. The graph-based data analysis pipeline 1 0 0 0 1 0 0 1 0 1 0 1 0 0 1 1 0 1 0 1 0 0 0 1 1 0 0 0 0 0 1 1 1 1 0 1 1 1 0 1 1 0 1 1 0 0 0 1 1 0 1 1 1 0 1 0 1 1 1 1 0 1 0 0 1 1 1 0 0 1 1 1 1 1 0 1 1 1 1 1 " Raw data! •  Relationships •  Images •  Text records •  Etc. " Convert to a graph! •  Nearest neighs •  Kernels •  2-mode to 1-mode •  Etc. " Algorithm/Learning! •  Important nodes •  Infer features •  Clustering •  Etc. KDD2015 David Gleich ¡ Purdue 2
  • 3. “Noise” in the initial data modeling decisions " Explicit graphs! are those that are given to a data analyst. “A social network” •  Known spam accounts included? •  Users not logged in for a year? •  Etc. A type of noise " Constructed graphs! are built based on some other primary data." “nearest neighbor graphs” •  K-NN or Îľ-NN •  Thresholding correlations to zero Often made for computational convenience! (Graph too big.) A different type a noise! " Labeled graphs! occur in information diffusion/propagation “function prediction” •  Labeled nodes •  Labeled edges •  Some are wrong A direct type of noise! Do these decisions matter? Our experience Yes! Dramatically so! KDD2015 David Gleich ¡ Purdue 3
  • 4. The graph-based data analysis pipeline 1 0 0 0 1 0 0 1 0 1 0 1 0 0 1 1 0 1 0 1 0 0 0 1 1 0 0 0 0 0 1 1 1 1 0 1 1 1 0 1 1 0 1 1 0 0 0 1 1 0 1 1 1 0 1 0 1 1 1 1 0 1 0 0 1 1 1 0 0 1 1 1 1 1 0 1 1 1 1 1 " Raw data! •  Relationships •  Images •  Text records •  Etc. " Convert to a graph! •  Nearest neighs •  Kernels •  2-mode to 1-mode •  Etc. " Algorithm/Learning! •  Important nodes •  Infer features •  Clustering •  Etc. Most algorithmic and statistical research happens here The potential downstream signal is determined by this step KDD2015 David Gleich ¡ Purdue 4
  • 5. Our goal: towards an integrative analysis 1 0 0 0 1 0 0 1 0 1 0 1 0 0 1 1 0 1 0 1 0 0 0 1 1 0 0 0 0 0 1 1 1 1 0 1 1 1 0 1 1 0 1 1 0 0 0 1 1 0 1 1 1 0 1 0 1 1 1 1 0 1 0 0 1 1 1 0 0 1 1 1 1 1 0 1 1 1 1 1 ! •  How does the graph creation process affect the outcomes of graph-based learning? •  Is there anything we can do to make this process more robust? KDD2015 David Gleich ¡ Purdue 5
  • 6. Graph-based learning is usually only one component of a big pipeline KDD2015 David Gleich ¡ Purdue 6 1 0 0 0 1 0 0 1 0 1 0 1 0 0 1 1 0 1 0 1 0 0 0 1 1 0 0 0 0 0 1 1 1 1 0 1 1 1 0 1 1 0 1 1 0 0 0 1 1 0 1 1 1 0 1 0 1 1 1 1 0 1 0 0 1 1 1 0 0 1 1 1 1 1 0 1 1 1 1 1 1 0 0 0 1 0 0 1 0 1 0 1 0 0 1 1 0 1 0 1 0 0 0 1 1 0 0 0 0 0 1 1 1 1 0 1 1 1 0 1 1 0 1 1 0 0 0 1 1 0 1 1 1 0 1 0 1 1 1 1 0 1 0 0 1 1 1 0 0 1 1 1 1 1 0 1 1 1 1 1 1 0 0 0 1 0 0 1 0 1 0 1 0 0 1 1 0 1 0 1 0 0 0 1 1 0 0 0 0 0 1 1 1 1 0 1 1 1 0 1 1 0 1 1 0 0 0 1 1 0 1 1 1 0 1 0 1 1 1 1 0 1 0 0 1 1 1 0 0 1 1 1 1 1 0 1 1 1 1 1 1 0 0 0 1 0 0 1 0 1 0 1 0 0 1 1 0 1 0 1 0 0 0 1 1 0 0 0 0 0 1 1 1 1 0 1 1 1 0 1 1 0 1 1 0 0 0 1 1 0 1 1 1 0 1 0 1 1 1 1 0 1 0 0 1 1 1 0 0 1 1 1 1 1 0 1 1 1 1 1 1 0 0 0 1 0 0 1 0 1 0 1 0 0 1 1 0 1 0 1 0 0 0 1 1 0 0 0 0 0 1 1 1 1 0 1 1 1 0 1 1 0 1 1 0 0 0 1 1 0 1 1 1 0 1 0 1 1 1 1 0 1 0 0 1 1 1 0 0 1 1 1 1 1 0 1 1 1 1 1 1 0 0 0 1 0 0 1 0 1 0 1 0 0 1 1 0 1 0 1 0 0 0 1 1 0 0 0 0 0 1 1 1 1 0 1 1 1 0 1 1 0 1 1 0 0 0 1 1 0 1 1 1 0 1 0 1 1 1 1 0 1 0 0 1 1 1 0 0 1 1 1 1 1 0 1 1 1 1 1 1 0 0 0 1 0 0 1 0 1 0 1 0 0 1 1 0 1 0 1 0 0 0 1 1 0 0 0 0 0 1 1 1 1 0 1 1 1 0 1 1 0 1 1 0 0 0 1 1 0 1 1 1 0 1 0 1 1 1 1 0 1 0 0 1 1 1 0 0 1 1 1 1 1 0 1 1 1 1 1 1 0 0 0 1 0 0 1 0 1 0 1 0 0 1 1 0 1 0 1 0 0 0 1 1 0 0 0 0 0 1 1 1 1 0 1 1 1 0 1 1 0 1 1 0 0 0 1 1 0 1 1 1 0 1 0 1 1 1 1 0 1 0 0 1 1 1 0 0 1 1 1 1 1 0 1 1 1 1 1 Many databases over" genes with survival rates" for various cancers List of possible genes" responsible for survival Cluster analysis Reinteration of data THIS STEP SHOULD! BE ROBUST TO ! VARIATIONS ABOVE!
  • 7. Scalable graph analytics Local methods are one of the most successful classes of scalable graph analytics They don’t even look at the entire graph. •  Andersen Chung Lang (ACL) Push method Conjecture" Local methods regularize some variant of the original algorithm or problem. Justication" For ACL and a few relatives this is exact! Impact?! Improved robustness to noise? KDD2015 David Gleich ¡ Purdue 7 c For instance, to " answer “what function” is shared by the started node, we’d only look at the circled region.
  • 8. Our contributions We study these issues in the case of " semi-supervised learning (SSL) on graphs 1.  We illustrate a common mincut framework for a variety of SSL methods 2.  Show how to “localize” one (and make it scalable!) 3.  Provide a more robust SSL labeling method 4.  Identify a weakness in SSL methods: they cannot use extra edges! We nd one useful way to do so. KDD2015 David Gleich ¡ Purdue 8
  • 9. Semi-supervised " graph-based learning KDD2015 David Gleich ¡ Purdue 9 Given a graph, and a few labeled nodes, predict the labels on the rest of the graph. Algorithm 1.  Run a diffusion for each label (possibly with neg. info from other classes) 2.  Assign new labels based on the value of each diffusion
  • 10. Semi-supervised " graph-based learning KDD2015 David Gleich ¡ Purdue 10 Given a graph, and a few labeled nodes, predict the labels on the rest of the graph. Algorithm 1.  Run a diffusion for each label (possibly with neg. info from other classes) 2.  Assign new labels based on the value of each diffusion
  • 11. Semi-supervised " graph-based learning KDD2015 David Gleich ¡ Purdue 11 Given a graph, and a few labeled nodes, predict the labels on the rest of the graph. Algorithm 1.  Run a diffusion for each label (possibly with neg. info from other classes) 2.  Assign new labels based on the value of each diffusion
  • 12. The diffusions proposed for semi- supervised learning are s,t-cut minorants KDD2015 David Gleich ¡ Purdue 12 1 3 2 6 4 5 7 8 9 10 t s In the unweighted case, " solve via max-flow. In the weighted case, solve via network simplex or industrial LP. minimize qP ij2E Ci,j |xi xj |2 subject to xs = 1, xt = 0. minimize P ij2E Ci,j |xi xj | subject to xs = 1, xt = 0. MINCUT LP Spectral minorant – lin. sys.
  • 13. Representative cut problems KDD2015 David Gleich ¡ Purdue 13 ∞ ∞ ∞ ∞ s t ZGL Îą Îą 4Îą 3Îą 4Îą 6Îą 3Îą 3Îą 5Îą 5Îą 5Îą 2Îą 5Îą 4Îą 5Îą s t Zhou et al. Positive label Neg. label Unlabeled Andersen-Lang weighting " variation too Joachims has a variation too. Zhou et al. NIPS 2003; Zhu et al., ICML 2003; Andersen Lang, SODA 2008; Joachims, ICML 2003 These help our intuition about the solutions All spectral minorants are linear systems.
  • 14. Implicit regularization views on the Zhou et al. diffusion KDD2015 David Gleich ¡ Purdue 14 Îą Îą 4Îą 3Îą 4Îą 6Îą 3Îą 3Îą 5Îą 5Îą 5Îą 2Îą 5Îą 4Îą 5Îą s t Zhou et al. RESULT! The spectral minorant of Zhou is equivalent to the weakly-local MOV solution. PROOF! The two linear systems are the same (after working out a few equivalences). IMPORTANCE! We’d expect Zhou to be “more robust” minimize qP ij2E Ci,j |xi xj |2 subject to xs = 1, xt = 0. The Mahoney-Orecchia-Vishnoi (MOV) vector is a localized variation on the Fiedler vector to nd a small conductance set nearby a seed set.
  • 15. A scalable, localized algorithm for Zhou et al’s diffusion. KDD2015 David Gleich ¡ Purdue 15 RESULT! We can use a variation on coordinate descent methods related to the Andersen-Chung-Lang PUSH procedure to solve Zhou’s diffusion in a scalable manner. PROOF. See Gleich-Mahoney ICML ‘14 IMPORTANCE (1)! We should be able to make Zhou et al. scale. IMPORTANCE (2)! Using this algorithm adds another implicit regularization term that should further improve robustness! minimize qP ij2E Ci,j |xi xj |2 subject to xs = 1, xt = 0. minimize P ij2E Ci,j |xi xj |2 + ⌧ P i2V di xi subject to xs = 1, xt = 0, xi 0.
  • 16. Semi-supervised " graph-based learning KDD2015 David Gleich ¡ Purdue 16 Given a graph, and a few labeled nodes, predict the labels on the rest of the graph. Algorithm 1.  Run a diffusion for each label (possibly with neg. info from other classes) 2.  Assign new labels based on the value of each diffusion
  • 17. Traditional rounding methods for SSL are value-based KDD2015 David Gleich ¡ Purdue 17 Class 1 Class 2Class 3 Class 1 Class 2 Class 3 CLASS 1 CLASS 2 CLASS 3 VALUE-BASED Use the largest value of the diffusion to pick the label. Zhou’s diffusion
  • 18. But value based rounding doesn’t work for all diffusions KDD2015 David Gleich ¡ Purdue 18 Class 1 Class 2Class 3 Class 1 Class 2 Class 3 Class 1 Class 2Class 3 Class 1 Class 2Class 3 Class 1 Class 2Class 3 (b) Zhou et al., l = 3 (c) Andersen-Lang, l = 3 (d) Joachims, l = 3 (e) ZGL, l = 3 Class 1 Class 2Class 3 Class 1 Class 2Class 3 Class 1 Class 2Class 3 Class 1 Class 2Class 3 (f) Zhou et al., l = 15 (g) Andersen-Lang, l = 15 (h) Joachims, l = 15 (i) ZGL, l = 15 CLASS 1 CLASS 3 CLASS 2 VALUE-BASED rounding fails for most of these diffusions BUT There is still a signal there! Adding more labels doesn’t help either, see the paper for those details
  • 19. Rank-based rounding is far more robust. KDD2015 David Gleich ¡ Purdue 19 Class 1 Class 2Class 3 NEW IDEA! Look at the RANK of the item in each diffusion instead of it’s VALUE. JUSTIFICATION! Based on the idea of sweep-cut rounding in spectral methods (use the order induced by the eigenvector, not its values) IMPACT! Much more robust rounding to labels
  • 20. Rank-based rounding has a big impact on a real-study. KDD2015 David Gleich ¡ Purdue 20 2 4 6 8 10 0 0.2 0.4 0.6 0.8 errorrate average training samples per class Zhou Zhou+Push 2 4 6 8 10 0 0.2 0.4 0.6 0.8 1 errorrate average training samples per class Zhou Zhou+Push We used the digit prediction task out of Zhou’s paper and added just a bit of noise" as label errors and switched parameters. VALUE-BASED RANK-BASED
  • 21. Main empirical results 1.  Zhou’s diffusion seems to work best for sparse graphs whereas the ZGL diffusion works best for dense 2.  On the digit’s dataset, dense graph constructions yield higher error rates 3.  Densifying the a super-sparse graph construction on the digits dataset yields lower error. 4.  And a similar fact holds on an Amazon co- purchasing network. KDD2015 David Gleich ¡ Purdue 21
  • 22. 5 10 0 50 100 150 number of labels numberofmistakes An illustrative synthetic problem shows the differences. Two-class block-model, 150 nodes each" between prob = 0.02, " withinprob = 0.35 (dense) or 0.06 (sparse) Reveal labels for k nodes (varied) and we have different error rates (sparse 0/10% low/high) and dense (20%/60% for low-high) KDD2015 David Gleich ¡ Purdue 22 5 10 0 50 100 150 number of labels numberofmistakes Joachims Zhou ZGL Real-world scenario" sparse graph, high error Sparse graph," low error 20 40 60 0 20 40 60 80 100 number of labels numberofmistakes 20 40 60 0 20 40 60 80 100 number of labels numberofmistakes Dense graph," low error rate Dense graph," high error rate
  • 23. Varying density in an SSL construction. KDD2015 David Gleich ¡ Purdue 23 Ai,j = exp ✓ kdi dj k2 2 2 2 ◆ di dj = 2.5 = 1.25 We use the digits experiment from Zhou et al. 2003. 10 digits and a few label errors. We vary density either by the num. of nearest neighbors or by the kernel width.
  • 24. As density increases, the results just get worse KDD2015 David Gleich ¡ Purdue 24 1 1.5 2 2.5 0 0.1 0.2 0.3 0.4 errorrate σ 0.8 1.2 1.5 1.8 2.1 2.5 Zhou Zhou+Push 10 2 0 0.1 0.2 0.3 0.4 errorrate nearest neighbors 5 10 25 50 100 150 200 250 Zhou Zhou+Push Varying kernel width Varying nearest neighors •  Adding “more” edges seems to only hurt. (Unless there is no signal). •  Zhou+Push seems to be slightly more robust (Maybe).
  • 25. Some observations and a question. Adding “more data” yields “worse results” for this procedure (in a simple setting). Suppose I have a real-world system that can work with up to E edges on some graph. Is there a way I can create new edges? KDD2015 David Gleich ¡ Purdue 25
  • 26. Densifying the graph with path expansions KDD2015 David Gleich ¡ Purdue 26 Ak = kX `=1 A` If A is the adjacency matrix, then this counts the total weight on all paths up to length k. We now repeat the nearest neighbor computation, but with paired parameters such that we have the same average degree. Zhou Zhou w. Push Avg. Deg k = 1 k 1 k = 1 k 1 19 0.163 0.114 0.156 0.117 41 0.156 0.132 0.158 0.113 53 0.183 0.142 0.179 0.136 104 0.193 0.145 0.178 0.144 138 0.216 0.102 0.204 0.101 k=4, nn = 3
  • 27. The same result holds for Amazon’s co-purchasing network KDD2015 David Gleich ¡ Purdue 27 mean F1 Condence intervals k Zhou Zhou w. Push Zhou Zhou w. Push 1 0.173 0.229 [0.15 0.19] [0.21 0.25] 2 0.197 0.231 [0.18 0.22] [0.21 0.25] 3 0.221 0.238 [0.17 0.27] [0.19 0.28] Amazon’s co-purchasing network (on Snap) is effectively a highly sparse nearest-neighbor network from their (denser) co-purchasing graph. We attempt to predict the items in a product category based on a small sample and study the F1 score for the predictions. Some small details missing – see the full paper. Ak = kX `=1 A`
  • 28. (a) K2 sparse (b) K2 dense (c) RK2 Figure 2: We articially densify this graph to Ak based on a cons and dense di↵usions and regularization. The color indicates the circled nodes. The unavoidable errors are caused by a mislabeled regularizing di↵usions on dense graphs produces only a small Towards some theory, i.e. why are densied sparse graphs better? KDD2015 David Gleich ¡ Purdue 28 How do sparsity, density, and regularization of a diffusion play into the results in a controlled setting? THE ERROR Labels
  • 29. (a) K2 sparse (b) K2 dense (c) RK2 Figure 2: We articially densify this graph to Ak based on a cons and dense di↵usions and regularization. The color indicates the circled nodes. The unavoidable errors are caused by a mislabeled regularizing di↵usions on dense graphs produces only a small Towards some theory, i.e. why are densied sparse graphs better? KDD2015 David Gleich ¡ Purdue 29 THE ERROR Labels Using Push algorithm P5 `=1 A` P5 `=1 A` Regularization Dense Dense
  • 30. Recap, discussion, future work Contributions 1.  Flow-setup for SSL diffusions 2.  New robust rounding rule for class selection 3.  Localized Zhou’s diffusion 4.  Empirical insights on density of graph constructions Observations •  Many of these insights translate to directed, weighted graphs with fuzzy labels and/or some parallel architectures. •  Weakness mainly empirical results on the density. •  We need a theoretical basis for the densication theory! KDD2015 David Gleich ¡ Purdue 30 Supported by NSF, ARO, DARPA CODE www.cs.purdue.edu/homes/dgleich/codes/robust-diffusions