Subproblem-Tree Calibration: A Unified Approach to Max-Product Message Passing - Presentation

Outline Introduction Background STC Framework Experimental Results Conclusion
Subproblem-Tree Calibration: A Uniﬁed Approach
to Max-Product Message Passing
Varad Meru, Prolok Sundaresan
Department of Computer Science,
Donald Bren School of Information and Computer Science,
UC Irvine
December 10th, 2014
Citation: Wang, Huayan, and Koller Daphne. ”Subproblem-tree
calibration: A uniﬁed approach to max-product message passing.” In
Proceedings of the 30th International Conference on Machine Learning
(ICML-13), pp. 190-198. 2013.

Outline
Introduction
Background
MAP: Maximum a posteriori estimation.
LP relaxation, and dual decomposition
Bethe cluster graphs
STC Framework
Subproblem multi-graph and subproblem trees
Max-consistency and dual-optimal on trees
The STC algorithm
Fixed-point characterization
Choosing allocation weights
General primal solutions
Experimental Results
Conclusion

Introduction I
MAP-MRF : Finding the most probable assignments for MRFs
(MPE)
NP-Hard
Large family of methods based on solving a dual problem of an
LP relaxation.
Recent Advances.
Convergent version of these algorithms can be interpreted as
block coordinate descent (BCD) in the dual.
Variants operate on small blocks - Max-product linear
programming algorithm (MPLP), max-sum diffusion (MSD)
and Tree-weighted max-product message passing (TRW-S).
Given block of dual-variables: enforce some consistency
constraint over the block.
Observation
Difficulties in generalizing these methods arise due to strong
consistency constraint - which are sufficient but not necessary.

Introduction II
Aim
Dual-optimality can be established on a much broader choices
of the dual objective.
Deriving a “uniﬁed” message passing algorithms in an arbitrary
dual-decomposition.
Properties of the Resulted Algorithm (subproblem-tree
calibration, or STC)
Message passing on graph-object (subproblem multi-graph, or
SMG)
Subsumes MPLP, MSD, and TRW-S
Achieves dual-optimality on blocks with ﬂexible choices.

MAP Inference
MAP Inference problem over X and graph strcuture
G = {V, E} can be formulated as
maximize
X
Θ(X)
Where Θ(X) = α∈A θα(Xα); A is the set of MRF cliques.
xi ∈ V al(Xi) and x = x1:N

LP relaxation, Dual decomposition I
Large family of MAP inference methods based on solving
Linear Programming (LP) relaxation
maximize
µ∈M
Θ · µ
Where µ = {µi(xi), µij(xi, xj)|∀i, xi, (i, j), (xi, xj)}; Θ is all
MRF parameters {θi, θij} concatenated in same ordering as µ
A decomposition of Θ(X) into subproblems c ∈ C,
parameterized by {Θc}
∀x,
c∈C
Θc
(x|c) = Θ(x)
Where x|c denotes restricting the joint assignment to the
scope of subproblem c.

LP relaxation, Dual decomposition II
Enforcing constraint by expressing reparameterization in terms
of messages
Θc
= Θc
 +
c :Xc∩X c=∅
δc →c(Xc ∩ X c)
where the messages satisfy δc →c = −δc→c
Each subproblem has its own copy of variables Xc

Bethe cluster (region) Graph I
Bipartite structure: one layer of “factor” nodes and one layer
of small (usually unary) nodes.
Restricted Design due to historical concern of satisfying the
’running intersection property’.
D( δf→i ) =
i
max
Xi
θi
(Xi
) +
f
max
Xf
Θf
(Xf
)
where the messages are only deﬁned between the two layers
(Bipartite structure).
The dual (mentioned earlier) becomes more restricted due to
the requirement of satisfying the running intersection property.

Bethe cluster (region) Graph II
(a) Markov Random
Field
(b) Cluster Graph (not Bethe
Cluster)

Bethe cluster (region) Graph III
(c) Bethe Cluster Graph
Figure 1: Cluster and Bethe Graph

SMG and subproblem-tree I
Subproblem Multi-Graph/Tree
Given C, the subproblem multi-graph (SMG) G = (V, E) has
one node for each c ∈ C and one edge between c and c for each
tuple (c,c ,ϕ), where ϕ ∈ V ∪ E is shared by c and c . A
subproblem multi-graph (SMG) is a tree T ⊂ G
If we include all unary subproblems into the decomposition,
we would get a SMG similar to Fig: (c) but with extra edges
among the non-unary subproblems.
So a tree in the Bethe cluster graph (which we call a Bethe
tree) is also a subproblem tree by deﬁnition.

SMG and subproblem-tree II

SMG and subproblem-tree III
For each SMG edge (c, c , ϕ) ∈ E, we have messages
δc →c = −δc→c . Therefore the block (of dual variables)
associated with subproblem tree T is given by:
BT
= {δc →c(Xϕ) : (c, c , ϕ) ∈ T }. (1)

Max-consistency and dual-optimal trees I
Given a block BT associated with some subproblem tree T , we
want to achieve dual-optimal w.r.t. that block
Dual-optimal on T
The subproblem potentials Θc
are dual-optimal on T if we can not
further decrease the dual objective by changing messages in BT .
Message passing algorithm achieves dual-optimality by
enforcing some Consistency Constraint.
We ﬁrst identify constraint that is equivalent to dual-optimal on T .
Assignments agree on T
Assignments to all subproblems {xc}c∈T agree on T , denoted as
xc ∼ T , if for ∀(c, c , ϕ) ∈ T , we have xc
ϕ = xc
ϕ .

Max-consistency and dual-optimal trees II
Weak max-consistency on T
{Θc
}c∈T satisﬁes weak max-consistency if
c∈T
max
Xc
Θc
(Xc
) = max
{Xc}∼T
c∈T
Θc
(Xc
)
Maximizing each subproblem independently gets to the same
optimal value as maximizing them while requiring the
assignments to agree on the tree.
Let Mc
ϕ be the (log)-max-marginal of c on ϕ, then
Mc
ϕ(xϕ) = max
Xc|ϕ=xϕ
Θc
(Xc
)
if ϕ = (i, j) ∈ E, Xc|ϕ = xϕ means Xc
i = xi and Xc
j = xj

Max-consistency and dual-optimal trees III
Strong max-consistency on T
{Θc
}c∈T satisﬁes strong max-consistency if
Mc
ϕ = Mc
ϕ ∀(c, c , ϕ) ∈ T
The relations among these consistency constraints are:
Proposition 1.
For any Bethe tree T ,
MPLP max-consistency =⇒ Weak max-consistency
For any subproblem tree T (including Bethe trees),
Strong max-consistency =⇒ Weak max-consistency.
Weak max-consistency ⇐⇒ Dual-optimal on T .

Subproblem tree calibration algorithm I
Algorithm calibrates a subproblem-tree by an upstream pass
and a downstream pass
Both update subproblem potentials “in place” without storing
any message.
(a) MRF (b) SMG (c) Spanning
Tree of SMG
Figure 2: Flow of the Algorithm: Start with (a) to generate (b) and
randomly selected (c) and ”Calibrate”

Subproblem tree calibration algorithm II
Algorithm -
1. Given MRF (left figure)
2. Split into subproblems (dual decomposition)
3. Build a multi-graph with a node for each subproblem (middle
figure)
4. Repeat
a. Randomly choose a subproblem-tree (right figure)
b. “Calibrate” the tree by max-product / min-sum message
passing
Properties
1 Each tree calibration is a block coordinate descent step for the
dual problem.
2 The “block” corresponds to all edges in the subproblem-tree.
3 Subsumes MPLP, TRW-S, and max-sum diffusion as special
cases.
4 Handles larger and more flexible “blocks” than these methods.

Subproblem tree calibration algorithm III

Choosing allocation weights
After STC, for each subproblem c
max
Xc
Θc
(Xc
) = ac · max
{X¯c}∼T
¯c∈T
Θ¯c
(X¯c
)
The downstream pass allocate “energy” to all subproblems
according to their allocation weights.
”Energy” = negative lograrithm of the probabilities. Helps in
the case of very small values to avoid numerical underﬂow as
well as making the computations easier to handle - moving
from max-product to max-summations.

General Primal solution
Given subproblem potentials, solutions to the original MAP
inference problem can be constructed in diﬀerent ways
Visit the variables (in the original MRF) in some ordering, for
example, X, X, . . . XN . And for Xi we choose the
assignment:
xi = arg max
c:i∈scope(c)
max
XcXi
Θc
(Xc
|Xj = xj, ∀j < i)
Visiting each Xi, we choose its assignment to maximize the
sum of all max-marginals from all subproblems covering Xi.
Fix Xi = xi in all subproblems.

Experimental MAP inference tasks I
1 The protein design benchmark
20 largest problems from that dataset
Number of Variables - 101 to 180
Number of Edges - 1973 to 3005
Variable Cardinality - 154
2 Synthetic 20-by-20 grid
Potentials from N(0, 1)
Variable Cardinality - 100
3 ”Object detection” task from PIC-2011
37 problem instances
Number of Variables - 60 / problem instance
Number of Edges - 1770 / problem instance
Variable Cardinality - 11 - 21

Experimental MAP inference tasks II
We observe that diﬀerent methods tend to “converge” to diﬀerent
dual objectives, Even though the dual objectives in each plot
should have exactly the same optimal value.

Experimental MAP inference tasks III

Conclusion
Two dimensions of ﬂexibility in designing a message passing
algorithm for MAP inference:
Choosing blocks to update
Choosing a dual state on a plateau in each BCD step.
STC algorithm can be applied with extreme ﬂexibility in these
choices.
Finding Principled and adaptive strategies in making these
choices will help design much more powerful message passing
algorithms.

Thank You
Questions?

Subproblem-Tree Calibration: A Unified Approach to Max-Product Message Passing - Presentation

More Related Content

What's hot (20)

Similar to Subproblem-Tree Calibration: A Unified Approach to Max-Product Message Passing - Presentation (9)

More from Varad Meru (15)

Recently uploaded (20)

Subproblem-Tree Calibration: A Unified Approach to Max-Product Message Passing - Presentation