Part VI Graph Algorithms VI.1
Outline:
Chapter 22: Representation of graphs and
breadth-first and depth-first searching
Chapter 23: Minimum-weight spanning trees --
an example of a greedy algorithm
Chapters 24 & 25: Given weights on the edges,
Chapter 24 shows how to compute shortest
paths from a given vertex to all other
vertices; Chapter 25 shows how to compute
shortest paths between all pairs of vertices
Chapter 26: Shows how to compute the maximum
flow in a directed graph from a source to a
sink vertex, given edge capacities
Given a graph G = (V,E), we usually measure
the size of the input to an algorithm in terms
of the two numbers |V| and |E| (not just one n
as before).
Convention: Inside asymptotic notation, and
ONLY there, we use just V and E instead of
|V| and |E|. E.g.: O(V+E) (not O(|V|+|E|) )
Convention: In pseudocode, we denote the
vertex set of G by V[G], and the edges by
E[G], so they are viewed as attributes.
Chapter 22 Elementary Graph Algorithms 22.0.1
Techniques for searching graphs are at the
heart of the field of graph algorithms.
Section 22.1 discusses the two most common
representations of graphs: as adjacency lists
and as adjacency matrices.
Sections 22.2 and 22.3 present the two most
common graph-search algorithms: breadth-first
and depth-first search.
Sections 22.4 and 22.5 give two applications
of breadth-first search: topological sorting
of a directed acyclic graph, and finding the
connected components of a directed graph.
22.1 Representation of Graphs 22.1.1
Two standard ways to represent G = (V,E):
1) The adjacency-list representation is good
for sparse graphs -- those for which |E| is
much less than |V|^2. Most of the graph
algorithms of the text assume that an input
graph is represented this way.
2) The adjacency-matrix representation is good
for dense graphs -- those for which |E| is
close to |V|^2 -- or when we need to be able
to tell quickly if there is an edge between
two given vertices (as in some all-pairs
shortest-paths algorithms in Chapter 25).
The adjacency-list representation of a graph
G = (V,E) consists of an array Adj of |V|
lists, one for each vertex in V. For each
vertex u in V, Adj[u] contains all vertices v
such that there is an edge (u,v) from u to v
in E. Figure 22.1(b) is an adjacency-list
representation of the undirected graph in
Figure 22.1(a); similarly, Figure 22.2(b) is
an adjacency-list representation of the
directed graph in Figure 22.2(a).
If G is directed, the sum of the lengths of
all the adjacency lists is |E|; if G is an
undirected graph, the sum of the lengths of
all the adjacency lists is 2|E|, since if
(u,v) is an edge, u appears in v's adjacency
list and vice versa.
Figure 22.1 (an undirected graph) 22.1.2
1 2 3 4 5
(1)---(2) +-----------+
| /|\ 1| 0 1 0 0 1 |
| / | \ 2| 1 0 1 1 1 |
| / | (3) 3| 0 1 0 1 0 |
| / | / 4| 0 1 1 0 1 |
|/ |/ 5| 1 1 0 1 0 |
(5)---(4) +-----------+
(c)
(a) adjacency-matrix
Adj
+--+ +------+ +-------+
1| --->| 2 | --->| 5 | / |
+--+ +------+ +-------+ +------+ +-------+
2| --->| 1 | --->| 5 | --->| 3 | --->| 4 | / |
+--+ +------+ +-------+ +------+ +-------+
3| --->| 2 | --->| 4 | / |
+--+ +------+ +-------+ +-------+
4| --->| 2 | --->| 5 | --->| 3 | / |
+--+ +------+ +-------+ +-------+
5| --->| 4 | --->| 1 | --->| 2 | / |
+--+ +------+ +-------+ +-------+
(b) The adjacency-list representation of G.
Figure 22.2 (a directed graph) 22.1.3
1 2 3 4 5 6
(1)-->(2) (3) +-------------+
| ^| /| 1| 0 1 0 1 0 0 |
| || / | 2| 0 0 0 0 1 0 |
| /| / | 3| 0 0 0 0 1 1 |
| / | / | 4| 0 1 0 0 0 0 |
| / | / | 5| 0 0 0 1 0 0 |
| / || | 6| 0 0 0 0 0 1 |
V/ VV V +-------------+
(4)<--(5) (6) (c)
| ^ adjacency-matrix
(a) |_|
Adj
+--+ +-------+ +-------+
1| --->| 2 | --->| 4 | / |
+--+ +-------+ +-------+
2| --->| 5 | / |
+--+ +-------+ +-------+
3| --->| 6 | --->| 5 | / |
+--+ +-------+ +-------+
4| --->| 2 | / |
+--+ +-------+
5| --->| 4 | / |
+--+ +-------+
6| --->| 6 | / |
+--+ +-------+
(b) The adjacency-list representation of G.
22.1.4
Adjacency lists can be used to represent
weighted graphs, in which each edge has a
weight, usually given by a weight function
w: E --> R. The weight w(u,v) of the edge
(u,v) in E is stored with vertex v in u's
adjacency list.
A disadvantage of adjacency lists is that the
only way to determine if an edge (u,v) is in
the graph is to search for v in Adj[u], which
can be slow. This can be remedied by using
the adjacency-matrix representation -- at the
cost of using asymptotically more memory.
In the adjacency-matrix representation, we
assume that the vertices are numbered 1,2,...,
|V|. Then the adjacency-matrix representation
of G consists of a |V|x|V| matrix A = (a ):
ij
a = / 1 if (i,j) is in E
ij \ 0 otherwise
Figures 22.1(c) and 22.2(c) are the adjacency-
matrix representations of the undirected and
directed graphs of Figures 22.1(a) and 22.2(a)
respectively. This representation requires
Theta(V^2) memory.
The matrix A of the adjacency-matrix 22.1.5
representation of an undirected graph is
symmetric about the main diagonal (as is
Figure 22.1(c)), so A is equal to its
transpose. The transpose of matrix A = (a )
T T T ij
is defined by A = (a ), where a = a
ij ij ji
Thus we can cut the memory needs almost in
half, if needed, by only storing the entries
above the main diagonal.
Like the adjacency-list representation, the
adjacency-matrix representation can be used
for weighted graphs. We simply store the
weight w(u,v) of edge (u,v) in the entry in
row u and column v of the adjacency matrix.
If an edge does not exist, we can store NIL,
but often it is convenient to store 0 or
infinity.
For small graphs, the simplicity of the
adjacency-matrix representation may make it
preferable to the adjacency-list. Moreover,
if the graph is unweighted, there is an added
storage advantage to the adjacency-matrix
representation, in that each entry can be
represented by a single bit rather than an
entire word of memory.
22.2 Breadth-first search 22.2.1
Breadth-first search (BFS) is a graph-search
algorithm. Prim's minimum-spanning-tree
algorithm and Dijkstra's single-source
shortest-paths algorithm are similar.
Given a graph G = (V,E) and a distinguished
source vertex s, breadth-first search explores
the edges of G to find each vertex that is
reachable from s. It computes the distance of
each reachable vertex from s, and produces a
"breadth-first tree" with root s that contains
all reachable vertices. A path from s to a
vertex v in that tree is a shortest path from
s to v. The algorithm works on both directed
and undirected graphs.
Breadth-first search is so named because it
expands the frontier between discovered and
undiscovered vertices uniformly across the
breadth of the frontier. So, the algorithm
discovers all vertices at distance k from s
before discovering any at distance k + 1.
Breadth-first search colors each vertex white
gray or black. All vertices start out white
and may become gray then black later.
A vertex is discovered the first time it is
encountered in the search, at which time it is
colored gray. Thus gray and black vertices
have been discovered. All vertices adjacent
to a black vertex have been disovered; some of
those adjacent to a gray vertex have not.
BFS constructs a breadth-first tree 22.2.2
starting with the root s. Whenever a white
vertex v is found while scanning the adjacency
list of an already discovered vertex u, then v
and (u,v) are added to the tree. We say u is
the predecessor or parent of v in the tree.
BFS (below) assumes G = (V,E) is represented
by adjacency lists. It maintains several data
items with each vertex u:
color[u] = the color of u
pi[u] = the predecessor of u in the tree
d[u] = the distance from s to u
Also, BFS keeps a queue Q of gray vertices.
BFS(G,s)
1 for each vertex u in V[G] - {s}
2 do color[u] <- white
3 d[u] <- infinity
4 pi[u] <- NIL
5 color[s] <- gray
6 d[s] <- 0
7 pi[s] <- NIL
8 Q <- phi |> Create an empty queue.
9 ENQUEUE(Q,s)
10 while Q not = phi
11 do u <- DEQUEUE(Q)
12 for each v in Adj[u]
13 do if color[v] = WHITE
14 then color[v] = GRAY
15 d[v] <- d[u] + 1
16 pi[v] <- u
17 ENQUEUE(Q,v)
18 color[u] <- BLACK
Lines 1-9 of BFS do initialization. 22.2.3
The while loop of lines 10-18 iterates as long
as there are any grey vertices -- vertices u
that have been discovered but whose Adj[u]
list has not been fully examined. This loop
maintains the following invariant:
At line 10, Q contains all the grey vertices
We won't use it to prove correctness, but it
is easy to see it is true initially and each
iteration of the loop maintains it.
Figure 22.3 shows an example at the beginning
of each iteration of the while and at the end.
The breadth-first tree depends on the order
in which the vertices are visited in line 12,
but the distances d do not.
Analysis
We use aggregate analysis. After line 9, no
vertex is ever whitened, so each vertex is
enqueued at most once. Enqueuing & dequeuing
take O(1) time, so the total queue operation
time is O(V). Because Adj[u] is scanned only
when u is dequeued, each adjacency list is
scanned at most once, so the time spent in
scanning them is O(E). The initialization
cost is O(V), so BFS's total cost is O(V + E).
Shortest paths 22.2.4
We claimed that d[v] is the distance from s
to a reachable vertex v. We define the
shortest-path distance delta(s,v) to be the
minimum number of edges in any path from s to
v; if there is no path, delta(s,v) = infinity.
A path of length delta(s,v) is said to be a
shortest path from s to v.
Lemmas 22.1, 22.2, 22.3, and Corollary 22.4
are used to prove:
Theorem 22.5 (Correctness of BFS)
Suppose G = (V,E) is a directed or undirected
graph, and that BFS is run on G from a given
source vertex s in V. Then BFS finds every
vertex v in V that is reachable from s, and
for all v in V, d[v] = delta(s,v). Moreover,
for any vertex v not = s that is reachable
from s, one of the shortest paths from s to v
is a shortest path from s to pi[v] followed
by the edge (pi[v],v).
Breadth-first trees
BFS(G,s) builds a breadth-first tree as it
searches the graph, as shown in Figure 22.3.
The tree is represented in the pi field in
each vertex. More formally, if G = (V,E) is a
graph with source s, we define the predecessor
subgraph of G as G_pi = (V_pi,E_pi), where
22.2.5
V_pi = {v in V : pi[v] not = NIL} U {s} and
E_pi = {(pi[v],v) : v is in V_pi - {s} }
Definition: G_pi is a breadth-first tree if
V_pi consists of the vertices reachable from s
&, for all v in V_pi, there is a unique simple
path from s to v in G_pi that is a shortest
path from s to v in G also. A breadth-first
tree is in fact a tree, since (Thm B.2, page
1085) it is connected and |E_pi| = |V_pi| - 1.
The edges in E_pi are called tree edges. And:
Lemma 22.6
When applied to a directed or undirected graph
g = (V,E), BFS constructs pi so that the
predecessor subgraph G_pi = (V_pi,E_pi) is a
breadth-first tree.
PRINT-PATH(G,s,v) below prints the vertices
on a shortest path from s to v, assuming that
BFS has already been run. PRINT-PATH runs in
linear time in the number of vertices printed.
PRINT-PATH(G,s,v)
1 if v = s
2 then print s
3 else if pi[v] = NIL
4 then print "no path from" s "to" v
5 else PRINT-PATH(G,s,pi[v])
6 print v
22.3 Depth-first search 22.3.1
By its name, depth-first search (DFS) searchs
"deeper" in a graph whenever possible. In DFS
edges are explored from the most recently
discovered vertex v. When all of v's edges
have been explored, DFS "backtracks" to the
vertex from which v was discovered. When all
the vertices reachable from the source have
been discovered, DFS selects a new source (if
there are any) and repeats the search. This
is repeated until all vertices are discovered.
In DFS the predecessor subgraph may be a
forest of trees instead of one tree as in BFS.
The predecessor subgraph G_pi = (V,E_pi) is:
E_pi = {(pi[v],v) : v in V & pi[v] not = NIL}
The predecessor subgraph forms a depth-first
forest composed of several depth-first trees.
The edges in E_pi are called tree edges.
As in BFS, each vertex is colored: initially
white, gray when discovered, and finally black
when it is finished, i.e. its adjacency list
has been completely examined. This guarantees
that each vertex is in exactly one depth-first
tree and that these trees are disjoint.
DFS also timestamps each vertex v: d[v] is
the discovery time (when v is grayed) and f[v]
is the finishing time (when v is blackened).
Time stamps are used in many graph algorithms.
Below is the basic DFS algorithm. 22.3.2
It works on directed & undirected graphs. The
global variable time is used for timestamping.
DFS(G)
1 for each vertex u in V[G]
2 do color[u] <- WHITE
3 pi[u] <- NIL
4 time <- 0
5 for each vertex u in V[G]
6 do if color[u] = WHITE
7 then DFS-VISIT(u)
DFS-VISIT(u)
1 color[u] <- GRAY |> u has been discovered
2 time <- time + 1
3 d[u] <- time
4 for each v in Adj[u] |> Explore edge (u,v)
5 do if color[v] = WHITE
6 then pi[v] <- u
7 DFS-VISIT(v)
8 color[u] <- BLACK |> Since u is finished
9 f[u] <- time <- time + 1
Figure 22.4 (page 542) shows the progress of
DFS(G) on the graph of Figure 22.2 (pge 528).
The results of DFS depend on the order in
which the vertices are visited in line 4 of
DFS(G) and line 5 of DFS-VISIT. But the
differences are not important in practice.
We use aggregate analysis to find 22.3.3
the running time of DFS. The loops of lines
1-3 and 5-7 of DFS take Theta(V) time.
DFS-VISIT is called exactly once for each
vertex -- when it is white. In DFS-VISIT(v),
the loop in lines 4-7 is executed |Adj[v]|
times, so the total cost of those lines is:
Sum ( |Adj[v]| ) = Theta(E)
v in V
Thus, the running time of DFS is Theta(V + E).
Properties of depth-first search
By examining the DFS-VISIT pseudocode, we
see that u = pi[v] if and only if DFS-VISIT(v)
was called during a search of u's adjacency
list. Also, v is a descendent of u in the
depth-first forest if and only if v was
discovered during the time u was gray.
Also the discovery and finishing times have a
parenthesis structure: if we represent the
discovery of u with "(u" and its finishing
with "u)", the parentheses will be properly
nested, as in Figure 22.5(b) (page 544) when
BFS is run on 22.5(a). Formally, we have the
Theorem 22.7 (Parenthesis theorem)
In any depth-first search of a (directed or
undirected) graph, for any 2 vertices u and v,
exactly one of these three conditions hold:
22.3.4
- the time intervals [d[u],f[u]] & [d[v],f[v]]
are disjoint, and neither u nor v is a
descendant of the other in the depth-first
forest,
- [d[u],f[u]] is entirely contained in
[d[v],f[v]], and u is a descendant of v in a
depth-first tree, or
- [d[v],f[v]] is entirely contained in
[d[u],f[u]], and v is a descendant of u in a
depth-first tree.
Corollary 22.8 (Nesting of descendant's
intervals)
Vertex v is a proper descendant of u in the
depth-first forest if and only if
d[u] < d[v] < f[v] < f[u].
Theorem 22.9 (White-path theorem)
In a depth-first forest, v is a descendant of
u if and only if at time d[u], vertex v can be
reached from vertex u along a path consisting
entirely of white vertices.
Classification of edges 22.3.5
One useful property of DFS is that it can be
used to classify the edges of G = (V,E). This
classification can be used to obtain useful
information about a graph. For example, a
directed graph is acyclic if and only if DFS
yields no "back" edges (Lemma 22.11 page 550).
We can define four types of edges in terms of
the depth-first forest G_pi produced by DFS.
1. Tree edges are edges in G_pi. Edge (u,v)
is a tree edge if v was first discovered by
exploring edge (u,v).
2. Back edges are those edges (u,v) connecting
u to an ancestor v in a depth-first tree.
Self-loops, which may occur in directed
graphs, are considered to be back edges.
3. Forward edges are nontree edges (u,v)
connecting u to a descendant v in a
depth-first tree.
4. Cross edges are all other edges. They can
go between vertices of the same depth-first
tree if one is not an ancestor of the other,
or they can go between different trees.
The edges in Figures 22.4 and 22.5 are labeled
this way. Figure 22.5(c) redraws 22.5(a) with
back edges going up & forward ones going down.
22.3.6
DFS(G) can be modified to classify edges as
it encounters them. Mostly (u,v) can be
classified according to the color of v when it
is first explored:
1. WHITE indicates a tree edge,
2. GRAY indicates a back edge since v stays
gray precisely during the time its
descendants are being discovered, and
3. BLACK indicates a forward or cross edge.
In the third case, an edge (u,v) is a forward
edge if d[u] < d[v] and a cross edge if
d[u] > d[v] (see Exercise 22.3-4 page 548).
In an undirected graph, we classify the edge
(u,v) to be the first type that applies -- to
resolve the ambiguity that (u,v) and (v,u) are
the same. Or equivalently, we classify the
edge according to which of (u,v) or (v,u) is
encountered first in DFS.
The following shows that forward and cross
edges never occur in the DFS of an undirected
graph.
Theorem 22.10
In the DFS of an undirected graph g, every
edge is either a tree edge or a back edge.
22.4 Topological sort 22.4.1
Given a directed acyclic graph, or "dag", G
we can use DFS to produce a topological sort
of G = (V,E): a linear ordering of vertices
such that if (u,v) is an edge of G, then u
appears before v in the ordering. We can view
the ordering by placing the vertices along a
horizontal line so that all directed edges go
from left to right (as in Figure 22.7(b)).
Dags are often used to indicate precedences
among events. Figure 22.7 (page 550) gives an
example of Prof. Bumstead getting dressed, in
which a directed edge (u,v) in Figure 22.7(a)
indicates that he must put on garment u before
garment v. A topological sort of this dag
yields an ordering for getting dressed. The
topologically sorted vertices appear in the
reverse order of their DFS finishing times as
produced by the following algorithm.
TOPOLOGICAL-SORT(G)
1 call DFS(G) to compute finishing times f[v]
for each vertex v
2 as each vertex is finished, insert it onto
the front of a linked list
3 return the linked list of vertices
It runs in Theta(V + E) time, since DFS takes
Theta(V + E) time and it takes O(1) time to
insert each of |V| vertices into the list.
Figure 22.7(a) 22.4.2
+-----------+ +-----+
11/16 |undershorts|________ |socks| 17/18
+-----------+ \ +-----+
| \____ |
| | |
V V V
+-----+ +-----+
12/15 |pants|----------------->|shoes| 13/14
+-----+ +-----+ +-----+
| ____|shirt| 1/8
| | +-----+
| | |
V V V
+----+ +---+ +-----+
6/7 |belt|_ |tie| 2/5 |watch| 9/10
+----+ \_ +---+ +-----+
| |
V V
+------+
|jacket| 3/4
+------+
Figure 22.7(b) 22.4.3
____________ ________
| \ / \
| _____ X \ ___ _____
| | \ / \ \ / \ / |
| | X \ X X |
| | / \ | / \ / \ |
| | / V V / V / V V
+-+ +-+ +-+ +-+ +-+ +-+ +-+ +-+ +-+
|s| |u| |p| |s| |w| |s| |b| |t| |j|
|o| |n| |a| |h| |a| |h| |e| |i|->|a|
|c| |d|->|n|->|o| |t| |i|->|l| |e| |c|
|k| |e| |t| |e| |c| |r| |t| +-+ |k|
|s| |r| |s| |s| |h| |t| +-+ |e|
+-+ |s| +-+ +-+ +-+ +-+ |t|
+-+ +-+
17 11 12 13 9 1 6 2 3
-- -- -- -- -- - - - -
18 16 15 14 10 8 7 5 4
We prove correctness of TOPOLOGICAL-SORT with
the following lemma that characterizes dags.
Lemma 22.11
A directed graph G is acyclic if and only if
( <==> ) a DFS of G yields no back edges.
Proof: ==>: If (u,v) is a back edge, then v is
an ancestor of u. Thus, there must be a path
from v to u in G, & (u,v) completes the cycle.
<==: If G contains a cycle c, 22.4.4
we show that DFS yields a back edge. Let v be
the first vertex discovered in c, and let
(u,v) be the preceding edge in c. At time
d[v], the vertices of c form a path of white
vertices from v to u. By the white-path
theorem, u becomes a descendent of v in the
depth-first forest. Therefore (u,v) is a back
edge.
Theorem 22.12
TOPOLOGICAL-SORT(G) produces a topological
sort of a directed acyclic graph G.
Proof: Suppose that DFS is run on a given dag
G = (V,E) to determine finishing times. We
need to show that for any edge (u,v) in E
where u and v are distinct, f[v] < f[u]. When
(u,v) is explored, v cannot be gray, since
then v would be an ancestor of u and (u,v)
would be a back edge, which contradicts Lemma
22.11. So, v must be either white or black.
If v is white, it becomes a descendant of u,
and so f[v] < f[u]. If v is black, it has
already been finished, so f[v] has already
been set. Since we are still exploring from u
we have not yet set f[u], which happens later,
so that f[v] < f[u] in this case also. Thus,
for any edge (u,v), we have f[v] < f[u],
proving the theorem.