Chapter 16 Greedy Algorithms 16.0.1 Each step of an optimization algorithm often has a set of options. We _could_ examine each one, as in dynamic programming. But for some problems, always picking the choice that looks best at the moment gives the right answer. Def. A greedy algorithm is one that always makes the choice seeming best at the moment. Not all optimization problems can be solved by greedy algorithms of course. Section 16.1 presents the activity-selection problem, which can be solved by a greedy algorithm. This is done by first finding a dynamic programming solution, and then showing that the greedy choice works. Section 16.2 reviews the theory of greedy algorithms, including proof of correctness. Section 16.3 treats an application of greedy techniques: Huffman codes. Section 16.4 deals with matroids, for which greedy algorithms always produce an optimal solution. Section 16.5 presents an application of matroids. The minimum-spanning-tree algorithms (Kruskal and Prim) of Chapter 23, Dijkstra's single- source shortest-path algorithm, and Chvatal's greedy set-covering heuristic (Chapter 35) are all examples of greedy algorithms. 16.1 An activity-selection problem 16.1.1 Problem: we have a set S = {a_1,a_2,...,a_n} of activities that wish to use a resource that can only be used by one at a time. Each activity has a start time s_i and a finish time f_i, with 0 <= s_i < f_i < infinity. Two activities a_i and a_j are compatible if f_i <= s_j or f_j <= s_i. The problem: find a maximum-sized subset of mutually compatible activities. It will be an advantage to order the activities by increasing finish times: f_0 <= f_1 <= f_2 <= ... <= f_n < f_(n+1) (1) Here is an example i | 1 2 3 4 5 6 7 8 9 10 11 ----+--------------------------------- s_i | 1 3 0 5 3 5 6 8 8 2 12 f_i | 4 5 6 7 8 9 10 11 12 13 14 The set {a_3,a_9,a_11} is mutually compatible but not maximal; {a_1,a_4,a_8,a_11} is such a maximal mutually compatible set - and so is {a_2,a_4,a_9,a_11}. We solve the problem in several steps. First we think of dynamic programming in which we combine subproblems to find a solution. But we only have to make one choice, the greedy one, leaving only one remaining subproblem. This allows us to design a recursive solution, which we convert to an iterative solution. This illustrates the relation between dynamic programming a greedy algorithms. The optimal substructure of the 16.1.2 activity-selection problem In dynamic programming we first find the optimal substructure of an optimal solution. Define the following sets: S_ij = {a_k in S : f_i <= s_k < f_k <= s_j } A_ij = a maximal set of compatible activities in S_ij i.e. S_ij is the subset of activities that start after a_i and finish before a_j. Suppose A_ij contains a_k, and let A_ik = A_ij intersect S_ik and A_kj = A_ij intersect S_kj, and thus A_ij = A_ik U {a_k} U A_kj, and so the maximal set of activities in S_ij consists of |A_ij| = |A_ik| + 1 + |A_kj| activities. The usual cut-and-paste argument shows that A_ik and A_kj must be maximal solutions for S_ik and S_kj: if A_ik' were a larger set of compatible activities from S_ik then |A_ik'| + 1 + |A_kj| > |A_ik| + 1 + |A_kj|, contradicting the maximality of A_ij; a a similar argument applies to A_kj. This characterization of optimal 16.1.3 substructure suggests the use of dynamic pro- gramming. Let c[i,j] be the size of an optimal subset of S_ij, which gives the recurrence: c[i,j] = c[i,k] + c[k,j] + 1 Of course we have to check all possibilities for a_k to obtain a maximal solution, i.e.: / 0 if S_ij = phi c[i,j] = < max{c[i,k]+c[k,j]+1} if S_ij !=phi \a_k in S_ij From this we could write a recursive algorithm and memoize it or write a bottom-up solution. But we can do better by using a greedy choice. Making the greedy choice We could save having to consider all choices in the formula above if we could choose an activity without first having to solve all the subproblems. In fact we only need consider one choice for an activity: the greedy choice. For this problem, a greedy choice is the activity that finishes first. Since the activities are ordered by increasing finish times, the first greedy choice is a_1. After chosing a_1, we only have to solve the subproblem of finding a maximal subset of activities starting after f_1 (and there are not any activities that end before s_1 since f_1 is the first finish time). Let S_k = {a_i in S : s_i >= f_k}. Then if we choose a_1, S_1 is the only subproblem left to solve, so an optimal solution consists of a_1 and an optimal solution to S_1. Is this correct? The following theorem says it is. Theorem 16.1 16.1.4 Let S_k be non-empty and a_m be an activity in S_k with earliest finish time. Then a_m is used in _some_ maximal subset of compatible activities of S_k. Proof: Let A_k be a maximal subset of compatible activities in S_k, including a_j with earliest finish time. If a_j = a_m, we are done; else let A_k' = A_k - {a_i} U {a_m}. The activities in A_k' are disjoint since the activities in A_k are, a_j is first to finish, and f_m <= f_j. Since |A_k'| = |A_k|, A_k' is a maximal subset of compatible activities of S_k, and it includes A_m. So even if we could use dynamic programming to solve the activity selection problem, we don't need to (and we haven't proved it has the overlapping subproblem property). Since the activities are in order of increasing finish times, we only have to consider each activity once. Also, we can use a top-down algorithm since in choosing a_m we don't have to know the solution to S_k. Greedy algorithms typically have this top-down structure. A recursive greedy algorithm 16.1.5 The procedure RECURSIVE-ACTIVITY-SELECTOR has as input arrays of start and finish times, the index k that defines subproblem S_k, and the size n of the original problem. It returns a maximal subset of mutually compatible activities in S_k. We assume the n activities are ordered by increasing finish times (if not we sort them in O(n lg n) time). To start, we add a fictitious activity a_0 with f_0 = 0, so S_0 = S (= the set of all activities). The initial call to solve the entire problem is: RECURSIVE-ACTIVITY-SELECTOR(s, f, 0, n). RECURSIVE-ACTIVITY-SELECTOR( s, f, k, n ) 1 m = k + 1 2 while m <= n and s[m] < f[k] // Find first 3 m = m + 1 // activity in S_k to finish 4 if m <= n 5 return {a_m} union RECURSIVE-ACTIVITY-SELECTOR(s,f,m,n) 6 else return phi Figure 16.1 shows a trace of the algorithm. Since each activity is examined only once, the run time is Theta(n) (unless we need to sort). The initial call to solve the problem 16.1.6 is: RECURSIVE-ACTIVITY-SELECTOR(s, f, 0, n+1). To analyze the run time, we note that over all the recursive calls each activity is examined exactly once in line 2, for a contribution of Theta(n). If the activities are not initially sorted, there is an added cost of O(n lg n). Figure 16.1 shows a trace of the algorithm. An iterative greedy algorithm The algorithm is almost tail-recursive, so it is not surprising that there is an easy conversion to an iterative algorithm, which accumulates a solution set A to problem S: GREEDY-ACTIVITY-SELECTOR( s, f ) 1 n = s.length 2 A = {a_1} 3 k = 1 4 for m = 2 to n do 5 if s[m] >= f[k] then 6 A = A union {a_m} 7 k = m 8 return A As with the recursive version, the run time is Theta(n) (if the activities are sorted). Elements of the greedy strategy 16.2.1 A greedy algorithm gets an optimal solution by making a sequence of choices. At each point we make the choice that is best at that time. In Sec. 16.1 we went through the steps: 1. Determine the optimal substructure. 2. Develop a recursive solution. 3. Prove that at each step one of the optimal choices is the greedy choice. 4. Show there is only one non-empty subproblem left after making the greedy choice. 5. Develop a recursive algorithm implementing the greedy strategy. 6. Convert the recursive algorithm to an iterative algorithm. This process can be shortened as follows, though there is almost always a less efficient underlying dynamic programming solution. 1. Convert the problem to making a choice and leaving one subproblem to solve. 2. Prove there is always an optimal solution that makes the greedy choice. 3. Prove that having made a greedy choice, the optimal solution to remaining subproblem can be combined with the greedy choice to give an optimal solution to the original problem. Key ingredients of greedy algorithms: 16.2.2 (1) the greedy-choice property, and (2) the optimal substructure property. Greedy-choice property The greedy-choice property states that we can obtain a globally optimal solution by making a locally optimal (greedy) choice. We can make this choice without first solving sub-problems - unlike the dynamic programming case in which we usually need to know the solutions to sub- problems _before_ making the choice so dynamic programming algorithms are usually implemented bottom-up. A greedy algorithm may depend on choices made so far, but not on future choices - solutions to subproblems not yet considered. Thus greedy algorithms are usually implemented in a top-down manner. To use a greedy algorithm, as in Theorem 16.1 we must prove that a greedy choice can always be included in an optimal solution and that it leaves a smaller, similar problem to solve. The greedy-choice property often produces gains in efficiency by making it clear how to solve the remaining subproblem. For example, in the activity-selection problem, once we had sorted the activities by finishing time, we only had to examine each activity once. This is typical of greedy algorithms: preprocessing input or using an appropriate data structure will allow us to make the greedy choice easily and lead to an efficient algorithm. Optimal substructure 16.2.3 A problem exhibits optimal substructure if an optimal solution to the whole problems has within it optimal solutions to subproblems. This is a common property of problems solved either by dynamic programming or by greedy algorithms, as we have seen. We can usually use a more direct approach to optimal substructure in a greedy algorithm, in that there is usually just one subproblem when we have made our choice. Thus, to use a greedy algorithm, we just need to argue that the greedy choice for the original problem and an optimal solution to the subproblem will yield an optimal solution to the original problem. This scheme implicitly uses induction on the (size of the) subproblems to prove that making the greedy choice at each step produces a globally optimal solution. Greedy versus dynamic programming Since greedy and dynamic programming methods both share the optimal substructure property, we must be careful to not use the greedy method when only dynamic programming applies, and not use dynamic programming when we can gain the efficiency of a greedy algorithm. We look at 2 versions of an optimization problem: maximizing the value of items in a knapsack. The 0-1 knapsack problem: 16.2.4 A thief robbing a store finds n items, the ith item is worth v_i dollars & weighs w_i pounds, where v_i and w_i are integers. He wants to maximize the value of the items he takes, but he can only carry (an integer) W pounds. The fractional knapsack problem: The setup is the same, except the thief can take fractions of items (rather than the 0-1, all-or-nothing choice of the previous case). In the 0-1 case you can think of the items as being solid whereas in the fractional case you can think of them as being powder. Both problems have the optimal substructure property. For the 0-1 problem, if we remove an item, say item j, from the knapsack, the remaining load must be the most valuable for items weighing at most W - w_j pounds that the thief can take from the n-1 items excluding item j. For the fractional problem, if we remove w pounds of item j, then the remaining load must be the most valuable for items weighing at most W - w pounds that the thief can take from the n-1 original items plus w_j - w pounds of item j. We can use a greedy algorithm to solve the fractional problem but not the 0-1 problem. For the fractional problem, we compute the value per pound v_i/w_i and sort these ratios in decreasing order. Then we make the greedy 16.2.5 choice as follows: at each step we pour as much of the most valuable-per-pound item yet remaining into the knapsack as will fit. It turns out that this strategy satisfies the greedy-choice property (Exercise 16.2-1). This greedy strategy doesn't work for the 0-1 problem, as shown in Figure 16.2. The greedy strategy would tell us to put the 10-pound Item 1 into the knapsack first, after which there would only be room for the 20-pound Item 2 or the 30-pound Item 3, both of which lead to suboptimal solutions. The best solution is to skip Item 1 and put Items 2 and 3 in the knapsack, as shown in Figure 16.2 (b). For the fractional case, we put all of Items 1 and 2 in the knapsack, and then as much of Item 3 as will fit, which yields an optimal solution, as shown in Figure 16.2 (c). The reason that using Item 1 doesn't work in the 0-1 case is that it leads to unused space in the knapsack, lowering the overall value-per- pound. To solve the 0-1 problem, we must compare the maximal solution that includes an item with the maximal solution that does not include it. This strategy gives rise to many overlapping subproblems, a hallmark of dynamic programming, which can be used to solve the 0-1 problem as indicated in Exercise 16.2-2. Huffman codes 16.3.1 David Huffman invented these codes in 1952. These codes can compress data with savings of 20% to 90%. The data are considered to be characters and the goal is to represent them by a binary code. Huffman's greedy algorithm uses a table of character frequencies to build an optimal binary code representation of them. For example if we have a 100,000-character file made up of six characters, we might have frequencies as given in Figure 16.3, in which the character "a" occurs 45,000 times. Among the ways of representing the characters we consider a binary character codes (or just code for short) in which there is a unique binary string for each character. We could use a fixed-length code, which would need 3 bits to represent six characters a,b,c,d,e,f, requiring 300,000 bits for the whole file. We can do better with a variable-length code which gives frequent characters short codes and infrequent characters longer codes. If we use the variable-length codes in Figure 16.3, we can save about 25.33%, reducing the file length to 224,000 bits = (45*1 + 13*3 + 12*3 + 16*3 + 9*4 + 5*4)*1000 Prefix codes 16.3.2 A code in which no codeword is a prefix of any other codeword is called a prefix code. We restrict attention to prefix codes which can be proven to be optimal for compression. Encoding using binary character codes is simple: just concatenate the codewords. For example, using the variable-length code of Figure 16.3, abc is encoded as 0.101.100 = 0101100 (where "." denotes concatenation). Prefix codes simplify decoding: there is a unique codeword that is a prefix to the file, which we replace by its character, and then repeat the process. For example 00101101 is 0.0.101.1101 which decodes to aabe. A convenient way to decode is to use a binary tree with the characters in the leaves, and internal nodes with 0 denoting a branch to the left and 1 a branch to the right as in Figure 16.4. Note: this is not a binary search tree. According to Exercise 16.3-1, an optimal code is always represented by a full binary tree, in which each nonleaf node has two children. Thus the fixed code tree shown in Figure 16.4a is not optimal. So now we restrict attention to full binary trees representing the alphabet C of characters (with positive frequency), so there are |C| leaves and |C|-1 internal nodes. If tree T represents the code, character c in C has frequency c.freq in the file & d_T(c) is the depth in T of leaf c, the number of bits = B(T) = Sum ( c.freq * d_T(c) ) c in C Constructing a Huffman code 16.3.3 We show the Huffman code creation algorithm & then show that it satisfies the greedy-choice property and has optimal substructure. As above, C is a set of n characters where each character c in C has a frequency c.freq. The algorithm builds the tree T corresponding to the optimal code in a bottom-up manner, starting with |C| leaves and performing |C|-1 "merge"s. It uses a min-priority queue Q to identify the two least frequent subtrees to merge. The "frequency" of the merged node is the sum of the frequencies of its children. Figure 16.5 shows a run with the frequencies as in Figure 16.3. The codeword for a letter is the sequence of edge labels from the root. If we assume Q is implemented as a binary min-heap, we can build the heap in O(n) time, and EXTRACT-MIN is called 2(n-1) times at a cost of O(lg n) for a total cost of O(n lg n). HUFFMAN(C) 1 n = |C| 2 Q = C 3 for i = 1 to n-1 4 allocate new node z 5 z.left = x = EXTRACT-MIN(Q) 6 z.right = y = EXTRACT-MIN(Q) 7 z.freq = x.freq + y.freq 8 INSERT(Q,z) 9 return EXTRACT-MIN(Q) // Return root of T Correctness of Huffman's algorithm 16.3.4 To prove correctness of Huffman's algorithm, we first show in Lemma 16.2 that it has the greedy-choice property, and then in Lemma 16.3 that it has the optimal substructure property. Lemma 16.2 Let C be as above, and let x and y be two characters with the lowest frequency. Then there exists an optimal prefix code for which the codewords for x and y have the same length and differ only in the last bit. Proof: We start with a tree T representing an arbitrary optimal prefix code and modify it to another tree with optimal prefix code in which x and y appear as sibling leaves of maximum depth in the new tree. Doing this will prove the lemma. And this is a greedy choice. Lemma 16.3 Let C and x and y be as in Lemma 16.2. Let C' be the new alphabet with x and y removed & a new character z added with z.freq = x.freq + y.freq. Let T' be any tree with an optimal prefix code for C'. The tree T, obtained from T' by replacing the leaf node for z with an internal node having x and y as children, is an optimal prefix tree for C. Theorem 16.4 Procedure HUFFMAN produces an optimal prefix code. Proof: Immediate from Lemmas 16.2 and 16.3.