Algorithm Design and Complexity - Course 7

Algorithm Design and Complexity
Course 7

Overview






Graphs – Introduction
Representing Graphs
Practical Examples of Using Graphs
Search Algorithms





BFS
DFS

Topological Sorting

Graphs – Introduction







Graphs are very important data structures as the
model a lot of real-life objects
There are a lot of problems that must be solved for
graphs
Some of them are very difficult (NP-complete)
Others are in P
In this chapter, we shall consider problems that are
in P, therefore accept polynomial time algorithms for
finding the exact solution

Very Useful in Practice
There exist a lot of
Open-Source libraries
for graphs:
- Graphviz
- http://www.graphviz.org/
- Prefuse
- http://prefuse.org/
- Flare
- http://flare.prefuse.org/
- Gephi


-

Useful for:
-

Generating graphs
Visualizing graphs
Modeling graphs

Representing Graphs


As input data:



Pairs of vertices representing the edges: (Src, Dest)
Specialized data formats for representing graphs






RDF
GraphML
dot

In memory, for designing algorithms





Adjacency lists
Adjacency matrix
Incidence matrix

Dot
graph G {node;
Dristor2--Muncii--Iancului—Obor;
Piata_Victoriei1--Gara_de_nord--Crangasi--Grozavesti--Eroilor;

Pacii--Lujerului--Politehnica--Eroilor;
Republica--Titan--Dristor1--Timpuri_Noi--Unirii1--Izvor--Eroilor;
Dristor1--Dristor2;
Unirii1--Unirii2;
Piata_Victoriei1--Piata_Victoriei2;
Piata_Sudului--Eroii_Revolutiei--Tineretului--Unirii2--Romana-Piata_Victoriei2--Aviatorilor--Pipera;
}

GraphML
<graphml xmlns="http://graphml.graphdrawing.org/xmlns">
<graph edgedefault="undirected">

<key id="name" for="node" attr.name="name" attr.type="string"/>
<key id="gender" for="node" attr.name="gender" attr.type="string"/>

<node id="1">
<data key="name">Jeff</data>
<data key="gender">M</data>
</node>
<node id="2">
<data key="name">Ed</data>
<data key="gender">M</data>
</node>
<edge source="1" target="2"></edge>
</graph>
</graphml>

Adjacency Lists


An array with |V| elements

A

One entry for each vertex
 Each component in the array contains the list
of neighbors for the index vertex
 Adj[u] ; u
V


I

B

B

H

C

D

E

F

F
G

C
H

A
K

K

K

A

J

B

B

I

J

G

L

L
D

L

E
F

E

D

H

A

G

C

J

K

Adjacency Matrix
A[i, j] =
1 if (i, j)
0 if (i, j)

A

E
E

A

B

C

D

E

F

1

G

H

I

J

K

1

L

1

1
1

B
C

1

1

D
E

I

1

F
1

G
A

J

H

1

I

B

1

J
K

G

K

C

L

H
L
D
E
F

1

1
1

Incidence Matrix


B[u, e]






u V
e E
= 1 if edge e leaves vertex u
= -1 if edge e enters vertex u
= 0 otherwise

Adjacency Matrix vs Adjacency Lists




Which one is better ?
Answer: It depends
Graphs may be:





Adjacency matrix:











Space required: (n2)
Time for going through all edges: (n2)
Time for finding if an edge exists: (1)

Adjacency lists:




Sparse: m = O(n)
Dense: m = (n2)

Space required: (n+m)
Time for going through all edges: (n+m)
Time for finding if an edge exists: (max(|Adj(u)|))

|V| = n, |E| = m
Matrix is better for dense graphs, lists are better for sparse graphs

Practical Examples of Using Graphs


Maps (roads, etc.), networks (computer, electric,
etc.), Web, flow networks (traffic – cars or
computer networks, pipes, etc.), relations
between processes/activities…



Simple examples:




The shortest path on Google Maps.
The people that are most central (or most important) in
a social network
Google PageRank

The Web + PageRank

http://en.wikipedia.org/wiki/PageRank

Computer Networks

http://ist.marshall.edu/ist362/pics/OSPF.gif

Semantic Web


Source: http://www.semanticfocus.com/media/insets/rdf-graph.png

Linked Open Data on the Web
(http://richard.cyganiak.de/2007/10/lod/)

Graph Search Algorithms


Graph search (traversal)





The result is a list of nodes in the order that they are
visited
This list should be useful to us!




A methodology to go through all the nodes in a graph

Should contain important information about the graph

Graph search are very simple algorithms, but have
quite some applications



Lee’s algorithm for BFS
Topological sorting, SCC, articulation points for DFS

Data & Notations









G = (V, E) the graph (either directed or undirected)
 V – set of vertices (|V| = n)
 E – of edges (|E| = m)
The edges are not weighted
 May also consider them to have equal weights: 1

(u,v) – an edge from node u to node v;
u..v – path from u to v; we can also denote intermediate
nodes u..x..v, u..y..v
R(u) - reachable(u) = the set of nodes that can be
reached by a path starting from node u
Adj(u) – nodes that are adjacent to node u

Data & Notations (2)


color(u) – color of node u – encodes the state of node u
during the search:




White – undiscovered by the search;
Grey – discovered, but not finished;
Black – finished (any finished node is also discovered)



We have discovered all nodes in Adj(u) – for BFS
We have discovered and finished all nodes in R(u) – for DFS



p(u) – parent of u – encodes the previous node on the
path used to discover node u in the search



Other notations exist, but are specific to BFS or DFS

Breadth First Search (BFS)


Uses a start vertex for the traversal: s



Objective: determine the minimum number of edges
(shortest path considering that all the edges have the
same weight = 1) between the source s and all the other
vertices of the graph



We also traverse the vertices in the order of their
distance from the source vertex



δ(s,u) – the cost of a optimum path s..u; δ(s,u) = ∞ <=> u
R(s)
d[u] = d(s,u) – the cost of the current discovered path s..u



BFS


For each node u, we store:




d[u] = d(s,u) – current distance from node s to node u
p[u]
color[u]



We also use a queue for storing the discovered
nodes that are not yet finished



The predecessors form a tree called a BFS tree




The edges are (p[u], u), where p[u] != NULL
The source s is the root
The level of each node in the BFS tree is d[u]

BFS – Algorithm
BFS(G, s)
FOREACH (u ∈ V)
p(u) = NULL; d[u] = INF; color[u] = WHITE; // initialization
Q=
// the queue
d[s] = 0;
color[s] = GREY
Q.push(s)
WHILE (Q.length() > 0)
// while we still have discovered nodes
u = Q.pop()
FOREACH (v ∈ Adj(u))
// for all neighbors of u
IF (color[v] == WHITE)
// if the node is undiscovered
d[v] = d[u] + 1
p[v] = u
color[v] = GREY
Q.push(v)
color[u] = BLACK
// the current node is finished

Complexity


Depends on how the graph is stored





It influences how we iterate through Adj(u)

(n+m) for adjacency lists
(n2) for adjacency matrix
WHILE (Q.length() > 0)
u = Q.pop()
// do something



// at most n times – once for each node
// n+m times for adjacency lists

It makes sense to use adjacency lists for BFS

Example
I

Source = A
A

Q = A; d(A) = 0
p(A) = null

J

B

I

G
H

G

L

H
E

Q = B, C
d(C) = 2

Q = C, H
d(H) = 2

I
A

A

J

B

I
A

J

B
K

C

H

D

K
C

L

E

H

D

p(C) = B

G

K
C

H

D

F

Q=E

H

D

F

p(H) = G p(D) = p(E) = C

F

Q=Ø
I

I

A

A

J

B

A

J

B

C

H

L D

K

F

C
L

H

D
E

F

p(F) = E

J

B

G

K

E

E

F

I

K
C

L

L

Q=F
d(F)=4

G

/
G

H
E

F

J

B

E

E
F

L

A

K
D

I
J

B

G

G

L

Q = H, D, E
Q = D, E
d(D)=d(E)=3

I

G
C

D

p(B) = A
p(G) = A

J

B
K

D
F

A

J

B
C

E

I

A

K
C

Q = G, B
d(B) = d(G) = 1

G

K
C

L

H

L

D
E

F

BFS – Properties & Correctness


While running BFS on graph G starting from source s:
v∈Q ⇔ v∈R(s)
 Not all the nodes in G are traversed
 Only the nodes that are reachable from s
 The others remain unexplored therefore d(u) = δ(s, u) =
INF u R(s)
(u,v) ∈ E, δ(s,v) ≤ δ(s,u) + 1




Usually, δ(s,v) = δ(s,u) + 1 when p(v) = u

BFS – Properties & Correctness (2)


Loop invariants – for the outer loop
Let S = {nodes that have been popped out the queue}



color[u] =











!= NULL
NULL

if u Q U S
if u V Q S

d[u] =





if u S
if u Q
if u V Q S

p[u] =




BLACK
GREY
WHITE

!= INF
INF

if u Q U S
if u V Q S

Let Q = {v1, …, vp} ; p >= 1


d[v1]

d[v2]

…

d[vp]

BFS – Properties & Correctness (3)



d[v1] d[vp]
Why?





Because the white neighbors of v1 are pushed in the queue
and they have d[u] = d[v1] +1
But, d[v1] … d[vp] d[u] = d[v1] +1

Therefore the nodes are added to the queue in order of
their d[u]




d[v1] + 1

And never change again

Using all the above properties, we can prove that d[v] =

δ(s,v)


v V

Thus BFS is correct!

Depth First Search (DFS)



There is no source vertex
All the vertices of the graph are traversed
This traversal does not compute the shortest
distance between vertices, but has a lot of useful
applications



It computes two elements for each vertex:








d[u] = discovery time for vertex u
f[u] = finish time for vertex u
It uses a discrete time between 1.. 2*n that is incremented
each time a value is assigned to the discovery or finish
time of a node

Discovery and finish time


Discovery of a vertex u:





When the vertex is seen for the first time in the traversal
Changes color from WHITE to GREY

Finishing a vertex u:





When the search leaves the vertex
All the nodes that could have been discovered from that
vertex are either GREY or BLACK
There is no WHITE vertex in R(u)
Changes color from GREY to BLACK

Data Structures


We need a stack (LIFO) in order to implement the
traversal in order to discover all the nodes in order of
how they are reached




For each node u, we use:








The predecessors are lower in the stack
p[u]
color[u]
d[u]
f[u]

color[u] and p[u] have the same meaning as for BFS

DFS Trees


Similar to BFS, the predecessors form a forest of
DFS trees:








The edges are (p[u], u), where p[u] != NULL
The roots of the DFS trees are those vertices that have
p[u] = NULL
There is a forest as it might be more than a single tree

All the vertices in R(u) that are WHITE when u is
discovered become descendents of u in the DFS
tree
All the ancestors of u in the DFS tree are colored in
GREY (including the source)

DFS - Algorithm
DFS(G)
FOREACH (u ∈ V)
color[u] = WHITE; p[u] = NULL;

// initialization

time = 0;
FOREACH (u ∈ V)
IF (color[u] == WHITE)
DFS_Visit (u);

DFS_Visit(u)
d[u] = ++time

// choose the roots of the DFS trees
// start exploring the node

// exploring a node
// the node is discovered

color[u] = GREY

// look for undiscovered neighbors

p[v] = u
DFS_Visit(v)

// continue exploring this vertex

color[u] = BLACK
f[u] = ++time

// the node is finished

Complexity


DFS_Visit is called exactly once for each vertex of
the graph





Therefore the complexity would be:




When the vertex is WHITE
When it is discovered

n * complexity(DFS_Visit without the recursive call)

Therefore, similar to BFS:



(n+m) if using adjacency lists
(n2) if using adjacency matrix

DFS - Example
The notation next to each vertex means d[u]/f[u]



I 17/
I

A 1/16
A

J 18/

J

B 2/5

B

G

K

K

G 6/15

C

C 7/14

H

H 3/4

L

L

D

D 8/9
E
F

E 10/13
F 11/12

DFS – Example (2)


Some of the steps have been omitted
I
A

J
B

G

K
C
H
D

L
E
F

A

A

J

B
G
H

D

G
H

D

F

H

D

F

G
H

D

G
C

H

D

F

J

B
K

G

K
C

L

H

D
E

E

E
F

L

A

J

B
K

C
L

A

J

B
K

E

E

E

G
C

L

A

I

I

I
J

B
K

C
L

A

J

B
K

C

I

I

I

F

F

L

DFS Tree – Example


In fact, the edges have the opposite sense than the
one represented in the figure
I

A

J
B

G

K
C
H
L
D
E
F

DFS – Properties


The DFS forest can be formally defined as:



Arb(G) = {Arb(u); p(u) = NULL}



Arb(u) = (V(u), E(u))





If G is undirected




V(u) = {v | d(u) < d(v) < f(u)} + {u};
E(u) = {(v, z) | v in V(u), z in V(u) && p(z) = v}

G is a connected graph <=> Arb(G) has a single tree

For a given graph, running DFS may build different DFS
forests (and thus different DF traversals)



Depending on the order of choosing the roots of the trees
Depending on how the elements in Adj(u) are chosen

Parenthesis Theorem


u, v V, there are three correct alternatives to
arrange the discovery and finish times of the two
nodes:


d[u]











f[u]

(u … (v … v) … u)

f[u]

d[v]

f[v]

(u … u) (v … v)

there is no direct descendent relation between u and v

d[v]


f[v]

v is a descendant of u in the DFS tree

d[u]


d[v]

f[v]

d[u]

f[u]

(v … v) (u … u)

there is no direct descendent relation between u and v
u and v may be in different trees or on different paths in the
same tree

(u = discovery of u
u) = finishing of u

White Path Theorem


At time d[u], any node v that is:




WHITE and
Reachable from u (in R(u))
There exists a path u..v that consists of only WHITE
vertices (except u that is GRAY)



Shall become a descendant of u in the same DFS
tree that u belongs too



Alternative:


v is a descendent of u in a DFS tree <=> there exists a
path that consists of only WHITE vertices (except u that is
GRAY)

Edge Classification
(u, v) E is in one of the following classes:




Tree edge




Back edge




Any edge from a node to one of its descendants that are not its
children

Cross edge




Any edge from a node to one of its ancestors in the DFS tree

Forward edge




Any edge that is part of a DFS tree

Any other edge that cannot be classified in one of the above
classes

Which are the colors of the two vertices ?

Edge Classification (2)


Tree edge





Back edge







Any edge from a node to one of its descendants that are not its children
(u, v) => u – GREY ; v - BLACK

Cross edge





Any edge from a node to one of its ancestors in the DFS tree
(u, v) => u – GREY ; v - GREY

Forward edge




Any edge that is part of a DFS tree
(u, v) => u – GREY ; v - WHITE

Any other edge that cannot be classified in one of the above classes
(u, v) => u – GREY ; v - BLACK

How can we classify a cross edge from a forward edge?


Use the relationship between d[u] and d[v]

Edge Classification (3)


An undirected graph only has two types of edges:



Tree edges
Back edges



There cannot be any forward edges (because the
edges have no orientation) or cross edges



Theorem: A directed graph G is acyclic <=> G has no
backward edges in a DFS search



Demo: on whiteboard
The same is true for undirected graphs as well

Application of Graph Searches


BFS:


Finding the minimum path in a maze with obstacles, a
source and a destination




Called Lee’s algorithm

DFS:







Topological Sorting
Strongly Connected Components
Articulation Points
Bridges
Biconnected Components

Topological Sorting



Given a DAG (Directed Acyclic Graph)
Used in real applications:


Activity diagrams:










Nodes: activities
Edges: dependencies between activities

Bayesian networks
Combinatorial logic
Compilers

We want to order the vertices: find A[1..n] such that
for (u, v) E and A[i] = u, A[j] = v => i < j

Topological Sorting – General


It is used to sort partially sorted sets






Let S– the set
∝ - the partial order relationship






Sets for which we can define a partial order relationship
There are elements that cannot be ordered

∝:SxS

We call a topological sort of S, a list A = {s1, …,sn} that
consists of all the elements from S, such that for any si ∝
sj => i < j
In the case of DAGs, the relationship are given by the
orientation of the edges: si ∝ sj <=> (si, sj) E

Topological Sorting – Example


Source: http://serverbob.3x.ro/IA/images/fig572_01_0.jpg



The example is from CLRS

Topological Sorting – Idea









Run a DFS on the graph, regardless how the DFS
trees root vertices are chosen
Also use a list for storing the topological sorting
After finishing a vertex, append it to the beginning of
the list

At the end, all the elements in the list shall be
ordered by their finish time:
A = (v1, v2, …, vn) => f[vi] > f[vj] 1 i < j n
Remark: A DAG may have more than a single
topological sorting

Algorithm
DFS(G)
FOREACH (u ∈ V)
color[u] = WHITE; p[u] = NULL;

A=
FOREACH (u ∈ V)
IF (color[u] == WHITE)

// initialization

// the list for storing the topological sorting

DFS_Visit (u, A)

PRINT A

// print the topological sorting

DFS_Visit(u, A)

// A is transmitted by reference

color[u] = GREY

// look for undiscovered neighbors

p[v] = u
DFS_Visit(v, A)

ELSE IF (color[v] == GREY)

print ‘ERROR: This is not a DAG!’
EXIT
color[u] = BLACK
A = cons(u, A)

// insert in front of list A

Correctness


We need to show that



Let’s consider that we explore (u, v)


It means that u is GREY



(u, v) E : f[v] < f[u]

Is v:




GRAY? NO, because (u, v) would be a back edge in a DAG.
Impossible!
WHITE? Then d[u] < d[v] < f[v] < f[u] (Parenthesis theorem, Tree
edge)
BLACK? Then d[v] < f[v] < d[u] < f[u] (Cross edge)
or d[u] < d[v] < f[v] < f[u] (Forward edge)

Conclusions


We have seen that graphs are very important in
modeling structures from the real world



Graph traversals are very simple algorithms



They are also very useful



Lots of applications



We have seen one of them: topological sorting

References


CLRS – Chapter 23



MIT OCW – Introduction to Algorithms – video
lecture 17

Algorithm Design and Complexity - Course 7

More Related Content

What's hot (20)

Similar to Algorithm Design and Complexity - Course 7 (20)

More from Traian Rebedea (20)

Recently uploaded (20)

Algorithm Design and Complexity - Course 7