Chapter 23 Minimum Spanning Trees 23.0.1 Given a connected, undirected weighted graph G = (V,E), it is often desired to find a least cost subset T of E that connects all vertices. Such a subset T will be acyclic, and thus also a tree, which we call a spanning tree. The problem of finding T is the minimum-spanning- tree (MST) problem. Figure 23.1 (page 625) shows a minimum spanning tree example: ___8__ ___7__ (b)------(c)------(d) //| // \\ |\\ 4 // | //2 \\ | \\9 // | // \\ | \\ (a) | (i) 4\\ | (e) \ 11| / \ \\ |14 / 8\ | 7/ \6 \\ | /10 \ | /_____\ _____\\| / (h)------(g)------(f) 1 2 The total weight of the spanning tree is 37; replacing (b,c) by (a,h) gives another minimum spanning tree of weight 37. We examine two "greedy" algorithms to find T: Kruskal's and Prim's. Each can be made to run in O(E lg V) time using binary heaps; by using Fibonacci heaps, Prim's algorithm can be sped up to O(E + V lg V). Sections 23.1 and 23.2 give a generic MST algorithm, and Kruskal's & Prim's algorithms. 23.1 Growing a minimum spanning tree 23.1.1 The following generic algorithm grows the tree maintaining the following loop invariant: Prior to each iteration, A is a subset of some minimum spanning tree. At each step, we find a "safe edge" (u,v) -- one that can be added to A without violating the invariant. GENERIC-MST(G,w) 1 A = phi 2 while A does not form a spanning tree 3 find a safe edge (u,v) for A 4 A = A Union {(u,v)} 5 return A Initialization: After line 1, A trivially satisfies the loop invariant. Maintenance: The loop in lines 2-4 maintains the invariant by only adding safe edges. Termination: All edges added to A are in a minimum spanning tree, so the set A returned in line 5 must be a minimum spanning tree. Theorem 23.1 tells us how to recognize safe edges, but first we need some terminology. A "cut" (S, V - S) of an undirected 23.1.2 graph G = (V,E) is a partition of V (see Fig. 23.2). We say an edge crosses the cut if one endpoint is in S and the other is in V-S. We say a cut respects a set A of edges if no edge in A crosses the cut. An edge is a light edge crossing a cut if its weight is the minimum of any edge crossing the cut. More generally, we say an edge is a light edge for a property if it has the minimum weight of any edge satisfying that property. Theorem 23.1 Let G = (V,E) be a connected, undirected graph with weight function w. Let A be a subset of E that is included in some MST for G, let (S, V - S) be any cut that respects A, and let (u,v) be a light edge crossing (S, V - S). Then (u,v) is a safe edge for A. Proof: Let T be a MST that includes A, and assume that T does not contain (u,v), since if it does, we are done. We shall construct another MST T' that includes A Union {(u,v)}, thus showing that (u,v) is a safe edge for A. The edge (u,v) forms a cycle with the path p from u to v in T, as shown in Figure 23.3. Since u and v are on opposite sides of the cut (S, V - S), there is at least one edge (x,y) in T on path p that also crosses the cut. Now (x,y) is not in A since the cut respects A. Also since (x,y) is on the unique path 23.1.3 from u to v in T, removing (x,y) breaks T into two components. Adding (u,v) reconnects them to form a new spanning tree: T' = ( T - {(x,y)} ) Union {(u,v)} We now show T' is a MST. Since (u,v) is a light edge crossing (S, V - S) and (x,y) also crosses this cut, w(u,v) <= w(x,y). Therefore w(T') = w(T) - w(x,y) + w(u,v) <= w(T). But T is a MST, so w(T) <= w(T'), and so T' must be a MST also. It remains to show that (u,v) is actually a safe edge for A. We know A is a subset of T' since A is a subset of T & (x,y) is not in A; thus A Union {(u,v)} is a subset of T'. Thus, since T' is a MST, (u,v) is safe for A. Corollary 23.2 Let G = (V,E) and w be as above. Let A be a subset of E that is included in some MST for G, and let C = (V_C, E_C) be a connected component in the forest G_A = (V,A). If (u,v) is a light edge connecting C to some other component in G_A, then (u,v) is safe for A. Proof: The cut (V_C, V - V_C) respects A, and (u,v) is a light edge for this cut. Therefore (u,v) is safe for A. 23.2 The algorithms of Kruskal & Prim 23.2.1 Kruskal's algorithm Kruskal's algorithm finds a safe edge to add to the growing forest by finding an edge (u,v) of least weight that connects two trees in the forest. Let C1 and C2 be the two trees that (u,v) connects. Since (u,v) must be a light edge connecting C1 to some other tree, Corollary 23.2 says that (u,v) is a safe edge for C1. Kruskal's algorithm is a greedy algorithm, since it always adds an edge of least possible weight. It uses a disjoint-set data structure (see Chapter 21). Each set contains the vertices in a tree of the current forest. FIND-SET(u) returns a representative element from the set containing u. The UNION procedure combines trees. An example is shown in Figure 23.4. MST-KRUSKAL(G,w) 1 A = phi 2 for each vertex v in G.V 3 MAKE-SET(v) 4 sort the edges of E into nondecreasing order by weight 5 for each edge (u,v) in G.E, taken in order 6 if FIND-SET(u) not = FIND-SET(v) 7 A = A Union {(u,v)} 8 UNION(u,v) 9 return A Kruskal's algorithm's running time 23.2.2 depends on the implementation of the disjoint- set data structure. The implementation of Section 21.3 is asymptotically the fastest one known, so we assume it. Line 1 takes O(1) time, and line 4 takes O(E lg E). The for loop of lines 5-8 does O(E) FIND-SET and UNION operations on the disjoint-set forest. Along with the |V| MAKE-SET operations, these take a total of O( (V + E) alpha(V) ) time (where alpha is the very slowly growing function defined in Section 21.4). Because we assume G is connected, |E| >= |V| - 1, and so the disjoint-set operations take O( E alpha(V) ) time. Moreover, since alpha(|V|) = O(lg V) = O(lg E), the total running time is O(E lg E). But since |E| < |V|^2, we have lg|E| = O(lg V) and so we can restate the running time of Kruskal's algorithm as O(E lg V). Prim's algorithm In Prim's algorithm, the set A forms a single tree and the safe edge to add is a light edge connecting A to a vertex not in A. By Corollary 23.2, this rule only adds edges that are safe for A, so when the algorithm ends, the edges of A form a MST. This strategy is greedy since A is augmented by an edge that adds the minimum weight to A. Figure 23.5 (page 635) illustrates the algorithm. Prim's algorithm uses a min-priority 23.2.3 queue Q to hold all the vertices that are not yet in the tree. For each vertex v, v.key is the minimum weight of any edge connecting v to a vertex in the tree (v.key = infinity if no such edge exists). v.pi is the parent of v in the tree. The set A is kept implicitly as: A = {(v, v.pi) : v is in V - {r} - Q } When the algorithm terminates, Q is empty and the MST A for G is thus: A = {(v, v.pi) : v is in V - {r} } MST-PRIM(G,w,r) 1 for each u in G.V 2 u.key = infinity 3 u.pi = NIL 4 r.key = 0 5 Q = G.V 6 while Q not empty 7 u = EXTRACT-MIN(Q) 8 for each v in G.Adj[u] 9 if v in Q and w(u,v) < v.key 10 v.pi = u 11 v.key = w(u,v) The algorithm maintains the three-part loop invariant for the while loop of lines 6-11: Prior to any iteration of lines 6-11, 23.2.4 1. A = {(v, v.pi) : v is in V - {r} - Q } 2. The vertices currently in the MST are those in V - Q 3. For all vertices v in Q, if v.pi is not NIL, then v.key < infinity and v.key is the weight of a light edge (v, v.pi) that connects v to a vertex already in the MST Line 7 identifies a vertex u in Q that is an endpoint of a light edge crossing the cut (V - Q, Q). Removing u from Q adds it to the set V - Q in the tree, thus adding (u, u.pi) to A. The for loop of lines 8-11 updates the key and pi fields of each vertex v adjacent to u but not in the tree, which maintains Part 3. If Q is implemented as a binary min-heap, we can use BUILD-MIN-HEAP to do lines 1-5 in O(V) time. The body of the while loop is done |V| times and since EXTRACT-MIN takes O(lg V) time that is a total of O(V lg V). The for loop is done O(E) times since there are 2|E| edges in the adjacency lists. The test for membership in Q in line 9 can be done in O(1) time by using a membership bit. Line 11 involves a DECREASE-KEY operation, which is O(lg V); the total time is O(V lg V + E lg V) = O(E lg V). We can do better with Fibonacci heaps, where DECREASE-KEY takes only O(1) amortized time. Thus the total running time can be improved to O(E + V lg V) by using a Fibonacci heap.