Part IV Advanced Design and Analysis IV.1 Techniques Outline: Chapter 15: Dynamic Programming applies to optimization problems in which a set of choices must be made to get the optimal solution. The key idea is to store the solution to a subproblem that can occur from more than one set of choices. A dynamic programming solution can sometimes change an exponential-time algorithm into a polynomial time algorithm. Chapter 16: Greedy algorithms also apply to optimization problems in which a set of choices must be made to get the optimal solution. The key idea here is to make each choice in a locally optimal way. An example is coin-changing: to minimize the number of coins given as change, repeatedly select the largest-denomination coin that is not greater than the amount still owed. Chapter 17: Amortized analysis is a tool for analyzing algorithms that perform a sequence of operations from a set of a few operations. Instead of bounding the cost of each operation separately, amortized analysis gives a bound on the entire sequence of operations. Chapter 15 Dynamic Programming 15.0.1 Like divide-and-conquer, dynamic programming solves problems by combining solutions to subproblems ("programming" refers to a tabular method, as also used in "linear programming", not to writing computer code). In contrast to divide-and-conquer, where the subproblems are independent, dynamic programming is applicable when subproblems share common subsubproblems. Dynamic programming solves each subsubproblem just once and saves its answer in a table. Dynamic programming is typically applied to optimization problems, which often have many solutions. Each solution has a value, and we wish to find a solution with the optimal value (minimum or maximum; several such solutions may exist). The four steps for developing a dynamic-programming algorithm solution are: 1. Characterize the structure of an optimal solution. 2. Recursively define the value of an optimal solution. 3. Compute the value of an optimal solution in a bottom-up fashion (may be desired result). 4. Construct an optimal solution from the computed information (may be omitted). 15.1 Rod Cutting 15.1.1 Section 15.1 treats the problem of cutting a rod into smaller rods to maximize their total value. Section 15.2 asks how to multiply a chain of matrices to minimize the number of scalar multiplications. Section 15.3 treats the theory underlying dynamic programming. In Sections 15.4 and 15.5 dynamic programming is used to solve the longest subsequence and optimal binary search tree problems. We want to know how to cut up a steel rod of length n into integer lengths in order to maximize revenue, given the prices p_i of a rod of length i, as in Figure 15.1 page 360: length i | 1 2 3 4 5 6 7 8 9 10 ----------|----------------------------------- price p_i | 1 5 8 9 10 17 17 20 24 30 Figure 15.2 shows the possibilities for n = 4: One rod of length 4, cost 9; rods of length 1 and 3: cost 9; 2 rods of length 2: cost 10 (optimal); 2 rods of length 1, 1 of length 2: cost 7; 4 rods of length 1: cost 4 15.1.2 In general, we can cut a rod in 2^(n-1) ways, opting to cut or not cut at distance i from the left end. We use addition to denote how to cut a rod: 7 = 2+2+3 indicates a rod of length 7 is cut into pieces of lengths 2, 2, & 3. In general, we optimally cut a rod into k pieces of lengths i_1, i_2, ..., i_k, so: n = i_1 + i_2 + i_3 + ... i_k and the optimal revenue is: r_n = p_i1 + p_i2 + ... + p_ik The optimal revenues for i = 1, 2, ..., 10: r_1 = 1 from 1 = 1 (no cuts), r_2 = 5 from 2 = 2 (no cuts), r_3 = 8 from 3 = 3 (no cuts), r_4 = 10 from 4 = 2 + 2, r_5 = 13 from 5 = 2 + 3, r_6 = 17 from 6 = 6 (no cuts), r_7 = 18 from 7 = 1 + 6 or 7 = 2 + 2 + 3, r_8 = 22 from 8 = 2 + 6, r_9 = 25 from 9 = 3 + 6, r_10 = 30 from 10 = 10 (no cuts). In general: 15.1.3 r_n = max( p_n, r_1 + r_(n-1), r_2 + r_(n-2), ..., r_(n-1) + r_1) (15.1) where p_n corresponds to no cut, and the other n -1 values r_i + r_(n-i) correspond to cuts at i. Thus to maximize r_n, we maximize the independent subproblems r_i and r_(n-i) - the optimal substructure property. We simplify the analysis by assuming the rod to the left of i has no further cuts; only the rod to the right may be cut again. This simplification reduces the problem to finding a solution to only one subproblem. So any decomposition has a cost p_i for the left rod and r_(n-i) for the revenue of the decomposed right rod. So if we use the entire rod, i = n, and we let r_0 = 0, thus obtaining the simpler formula, which is Step 1 of dynamic programming: r_n = max ( p_i + r_(n-i) ) (15.2) i <= i <= n Recursive top-down implementation (Step 2) CUT_ROD(p, n) 1 if n == 0 2 return 0 3 q = -inf 4 for i = 1 to n 5 q = max(q, p[i] + CUT_ROD(p, n - i) ) 6 return q CUT_ROD has array p[1..n] of prices 15.1.4 and n as arguments, and returns the maximum value of the revenue. It can be proved to be correct by using formula (15.2). However, CUT_ROD inefficiently calls itself repeatedly on small inputs. Figure 15.3 (on page 364) shows what happens when n = 4: ____________(4)____ / | \ \ / | \ \ (3) (2) (1) (0) __/ | \ / \ | / | \ | | | (2) (1) (0) (1) (0) (0) / \ | | (1) (0) (0) (0) | (0) The number of calls is given by: T(0) = 1 and n-1 T(n) = 1 + Sum T(j) j=0 which has the solution T(n) = 2^n (as Exercise 15.1-1 asks you to prove). Using dynamic programming for optimal 15.1.5 rod cutting For Step 3: we solve each subproblem once, saving the result in a table, looking it up when we need it in the future. The memory for the table is an example of a time-space trade- off. A dynamic programming solution runs in polynomial time if the number of distinct subproblems is polynomial in n and each subproblem is solvable in polynomial time. Dynamic programming is implemented in 2 ways: (1) top-down with memoization, in which the the procedure first sees if a subproblem has been solved: if so, it is looked up from the table; if not, it is computed and put in the table. Such a procedure has been "memoized". (2) bottom-up, which solves the smallest subproblems first, working up to the larger subproblems. Again, each subproblem is only solved once. They have the same asymptotic running time, except in rare cases where the top-down does not recurse to all subcases. Otherwise the bottom-up approach has better constant factors due to less overhead for recursive calls. MEMOIZED-CUT-ROD(p,n) 15.1.6 1 let r[0..n] be a new array 2 for i = 0 to n 3 r[i] = -infinity 4 return MEMOIZED-CUT-ROD_AUX(p,n,r) MEMOIZED-CUT-ROD_AUX(p,n,r) 1 if r[n] >= 0 2 return r[n] 3 if n == 0 4 q = 0 5 else q = -infinity 6 for i = 1 to n 7 q = max(q, p[i] + MEMOIZED-CUT-ROD_AUX(p,n-i,r)) 8 r[n] = q 9 return q This is just the memoized version of CUT_ROD. The bottom-up version is even simpler: BOTTOM-UP-CUT-ROD(p,n) 1 let r[0..n] be a new array 2 r[0] = 0 3 for j = 1 to n 4 q = -infinity 5 for i = 1 to j 6 q = max( q, p[i] + r[j-i] ) 7 r[j] = q 8 return r[n] Due to the nested for-loops, this runs in Theta(n^2) time. It is harder to see, but MEMOIZED-CUT-ROD also runs in Theta(n^2) time. Subproblem graphs 15.1.7 We should understand the set of subproblems and how they depend on one another. This is embodied in the subproblem graph. Figure 15.4 (page 367)shows the subproblem graph for the rod-cutting problem when n = 4: _(4) ///| /// V / ||(3) | |\ |\\ | | \V \\ | | (2)|| | \ /| || | X | /| | / \V/ | \ \ (1) | \ \ | / \_\V/ (0) There is an edge from x to y if finding an optimal solution to x depends on finding one for y. The subproblem graph is a "reduced" or "collapsed" version of the top-down recursion tree: so Figure 15.4 is the reduded version of Figure 15.3. In the bottom-up method, we go back up the graph in a "reverse topological sort"; the top-down method corresponds to DFS. The time to solve a subproblem is proportional to the number of edges (= the degree) going out from it, so the total solving time is O(E), and each subproblem/vertex must be solved, so the total running time is usually Theta(V + E). Reconstructing a solution (Step 4) 15.1.8 The bottom-up method reports the value of the optimal solution, but not the choices made; we extend it to record optimal size s_j of rod j: EXTENDED-BOTTOM-UP-CUT-ROD(p,n) 1 let r[0..n] and s[0..n] be new arrays 2 r[0] = 0 3 for j = 1 to n 4 q = -infinity 5 for i = 1 to j 6 if q < p[i] + r[j-i] 7 q = p[i] + r[j-i] 8 s[j] = i 9 r[j] = q 10 return r and s If we call it with n = 10, it returns arrays: i | 0 1 2 3 4 5 6 7 8 9 10 -----|---------------------------------------- r[i] | 0 1 5 8 10 13 17 18 22 25 30 s[i] | 0 1 2 3 2 2 6 1 2 3 10 The following method prints a list of optimal piece sizes for a rod of length n. PRINT-CUT-ROD-SOLUTION(p,n) 15.1.9 1 (r,s) = EXTENDED-BOTTOM-UP-CUT-ROD(p,n) 2 while n > 0 3 print s[n] 4 n = n - s[n] If n = 10, it would just print 10, but if n = 7, it would print 1 and 6, corresponding to the optimal decomposition given on page 15.1.2. Matrix-chain multiplication 15.2.1 Suppose is a chain of matrices to be multiplied to get A_1A_2...A_n. Due to associativity of matrix multiplication, we can compute this product in several ways - we indicate the order in which to perform the multiplications by fully parenthesizing it. There are 5 ways to parenthesize if n = 4: (A_1(A_2(A_3A_4))) (A_1((A_2A_3)A_4)) ((A_1A_2)(A_3A_4)) ((A_1(A_2A_3))A_4) (((A_1A_2)A_3)A_4) (A_1(A_2(A_3A_4))) The parenthesization can have a big impact on the cost of the calculation. Here is the standard way to multiply two matrices: MATRIX-MULTIPLY(A,B) 1 If A.columns not = B.rows 2 error "incompatible dimensions" 3 else let C be a new A.row x B.columns matrix 4 for i = 1 to A.rows 5 for j = 1 to B.columns 6 c_ij = 0 7 for k = 1 to A.columns 8 c_ij = c_ij + a_ik*c_kj 9 return C The number of columns of A must = the number of rows of B; if A is a pxq matrix and B is a qxr matrix, C is a pxr matrix. The main cost is the "*" in line 8, which is done pqr times. For example, consider the chain with dimensions 10x100, 100x5, and 5x50. If we parenthesize as ((A_1A_2)A_3), the cost is 10*100*5 = 5000 to compute A_1A_2 and 10*5*50 = 2500 to compute its product with A_3 for a total of 7500 scalar multiplications. If we parenthesize as (A_1(A_2A_3)), the cost is 100*5*50 = 25,000 to compute A_2A_3 and 10*100*5 = 50,000 to compute its product with A_1, for a total of 75,000 scalar multiplies. The matrix-chain multiplication 15.2.2 problem: given a chain of n matrices where A_i has size p_(i-1) by p_i, fully parenthesize A_1A_2...A_n to minimize scalar multiplications. Note that the cost to find this parenthesization will be much less than the cost to multiply the matrices. Counting the number of parenthesizations Iterating through all parenthesizations is not efficient. Let P(n) be the number of parenthesizations of n matrices. Then P(1) is 1; if k>1 the number of ways to parenthesize splitting between the k-th and (k+1)-st matrix is P(k)*P(n-k), and since we can split at any k = 1, 2, ..., n-1, the recurrence for P is / n-1 P(n) = < Sum P(k)P(n-k) if n > 1 \ k=1 P(n) = C(n-1), where C(n) = B(2n,n)/(n+1) ( = Theta(4^n/n^1.5) ) is the n-th Catalan number & B(2n,n) is the central binomial coefficient. Step 1. Characterize the structure of 15.2.3 an optimal solution. Let A_i..j denote the product A_i*...*A_j. Then if A_1..n = (A_1..k)(A_k+1..n), is an optimal parenthesization, A_1..k and A_k+1..n are also optimally parenthesized, the first hallmark of applicability of dynamic programming. Step 2. Recursively define the value of an optimal solution. Let m[i,j] = minimum number of scalar multiplications to compute A_i..j, then: / 0 if i = j m[i,j] = < min{m[i,k]+m[k+1,j]+p_i-1*p_k*p_j} \i<=k, where p.length = n+1. Another table s[i,j] stores the index k to split A_i..j to get least cost. MATRIX-CHAIN-ORDER(p) 15.2.4 1 n = p.length - 1 2 let m[1..n,1..n] & s[1..n-1,2..n] be tables 3 for i = 1 to n 4 m[i,i] = 0 5 for l = 2 to n // l = length of chain 6 for i = 1 to n - l + 1 7 j = i + l - 1 8 m[i,j] = infinity 9 for k = i to j - 1 10 q = m[i,k] + m[k+1,j] + p_(i-1)p_kp_j 11 if q < m[i,j] 12 m[i,j] = q 13 s[i,j] = k // k = best split yet 14 return m and s The minimum cost is m[1,n]. Figure 15.5 shows an example when n = 6. Since we only use half of each table, they are rotated 45 degrees counter-clockwise. The outer loop of the algorithm fills entries one line at a time from the bottom (previously the main diagonal) to the top vertices m[1,n] and s[1,n] (which tells where to make the first split). The nested loop structure gives a running time of O(n^3) since each loop is executed at most n times. A careful count shows that the number of times the inner loop is executed is (1/6)n^3 - n/6, so the running time is actually Theta(n^3). Step 4. Constructing an optimal 15.2.5 solution Each entry s[i,j] tells where to split A_i..j to obtain the minimal cost. So s[1,n] tells where to make the first split, and then recursively s[1,s[1,n]] tells where to split the left half and s[s[1,n]+1,n] tells where to split the right half, etc. The following algorithm prints the optimal parenthesization with initial call PRINT-OPTIMAL-PARENS(s,1,n). PRINT-OPTIMAL-PARENS(s,i,j) 1 if i == j 2 print "A"_i 3 else print "(" 4 PRINT-OPTIMAL-PARENS(s, i, s[i,j]) 5 PRINT-OPTIMAL-PARENS(s, s[i,j]+1, j) 6 print ")" In the example of Figure 15.5, the call PRINT-OPTIMAL-PARENS(s,1,6) prints out the parenthesization ((A_1(A_2A_3))((A_4A_5)A_6)) Elements of Dynamic Programming 15.3.1 What is necessary in order to apply dynamic programming? Answer: (1) optimal substructure, and (2) overlapping subproblems We will also look at the memoization method. Optimal substructure Definition: A problem has optimal substructure if an optimal solution contains optimal solutions to subproblems. This is one indication that a problem might have a dynamic programming solution, though it might also have a greedy algorithm solution. We have seen optimal substructure in all the problems solved by dynamic programming so far. Here is a pattern to find optimal substructure 1. Show that the solution consists of making a choice: where to cut for a rod, a "splitting index" for a matrix-chain, or an intermediate vertex in a shortest path. 2. Assume that you are given the choice that leads to an optimal solution. 3. Given this choice, determine the ensuing subproblems and how to characterize the space of subproblems. 4. Show that the solutions to sub- 15.3.2 problems within an optimal solution are also optimal by using a "cut-and-paste" argument: assume that a subsolution is non-optimal, by replacing it with an optimal solution, one would reduce (or increase) the value of the of the whole solution, giving a better value so that the original value was not optimal after all - a contradiction. To characterize the space of subproblems, a we should make it as simple as possible. In the rod-cutting case, we only need to consider cutting a rod of length i for each size i; in the matrix-chain case, the subproblems are of the form A_i..j, where we allow both i and j to vary (giving a 2-dimensional space). Optimal substructure varies in two ways: (1) how many subproblems are used in an optimal solution, and (2) how many choices we have in determining which subproblems to use in a solution. The rod-cutting problem uses one subproblem of size n-i but we must consider n choices for i. In the matrix-chain case to solve A_i..j, there are two subproblems A_i..k, and A_(k+1)..j, and j - i ways of picking k. So the cost of a dynamic programming 15.3.3 algorithm is the product of the number of subproblems times the number of choices for each one. In the rod-cutting case, there were Theta(n) subproblems and n choices for each one for Theta(n^2) total cost. For the matrix-chain case, there were Theta(n^2) sub- problems and at most n-1 choices, for O(n^3) total cost - actually the cost is Theta(n^3). Dynamic programming produces a bottom-up solution: first find optimal solutions to sub- problems, then use them to make choices to find an optimal solution to the whole problem. So the cost is the cost of the subproblems plus the cost of making the choice. For rod- cutting, we first found the cost of cutting rods of length 0, 1, ..., n-1 and then chose the one giving an optimal solution for a rod of length n; the choice cost is p_i (Equation (15.2)). In the matrix-chain case, the choice cost was the term p_(i-1)*p_k*p_j. Greedy algorithms (Chapter 16) have some similarities to dynamic programming algorithms - in particular they both have the optimal substructure property. The difference is that greedy algorithms work in a top-down way, making the best (greedy) choice at the time _before_ knowing the solutions to the sub- problems (after making the choice they then solve the subproblems). Subtleties 15.3.4 We must use care in identifying optimal substructure. Consider the following two problems on a directed graph G = (V,E) and vertices u and v: Unweighted shortest path. Find a path from u to v with the fewest edges (which must be simple, otherwise we could remove a cycle to get a shorter path). Unweighted longest simple path. Find a simple path from u to v with the most edges (we need to exclude cycles, otherwise we could go around them many times to get an arbitrarily high edge count). The unweighted shortest path problem has the optimal substructure property by the usual argument: if a subpath of an optimal path was not optimal, it could be replaced by a shorter subpath, giving a shorter total path than the original "optimal" path. However the unweighted longest simple path problem does not have the optimal substructure property, as shown by Figure 15.6: (q)<-->(r) Now q-->r-->t is a longest path from ^ ^ q to t, but neither of its subpaths | | q-->r or r-->t is a longest path v v between their endpoints. (s)<-->(t) There is no known good dynamic 15.3.5 programming solution to this problem - in fact it is NP-complete, which means it probably can't be solved in polynomial time. The distinction between these two problems is that subproblems of the unweighted shortest path problem are independent - finding a shortest path from q to r does not affect the finding of a shortest path from r to t (if they did share a vertex, we would have a cycle which we have seen can't happen for a shortest path). On the other hand if we have a longest path from q to r, it would include all of the vertices and there would be none left to use for a longest path from r to t, so finding a first longest subpath _does_ effect finding a second subpath. In the matrix-chain case, multiplying A_i..k and multiplying A_(k+1)..j are independent. In the rod-cutting case, we determine the best way to make the first cut; "sub-cutting" those two pieces is independent. Overlapping subproblems 15.3.6 The second ingredient of a problem amenable to dynamic programming is that it have "overlapping subproblems" - the natural recursive algorithm would have to solve the same problem many times, as we saw in the rod-cutting case, which uses 2^n computations to cut a rod of length n. In contrast, the dynamic programming solution is Theta(n^2). For dynamic programming to be effective, the space of subproblems is usually polynomial in the input size. A dynamic programming algorithm takes advantage of overlapping subproblems by solving them once and storing the result in a table, after which the result can simply be looked up in constant time. In the matrix-chain case, the smaller sub- problems are looked up many times. Figure 15.7 shows the case of four matrices, in which there are 10 subproblems and there would be 25 if we didn't use overlaps. Consider: RECURSIVE-MATRIX-CHAIN(p,i,j) 1 if i == j then 2 return 0 3 m[i,j] = infinity 4 for k = i to j-1 5 q = RECURSIVE-MATRIX-CHAIN(p,i,k) + RECURSIVE-MATRIX-CHAIN(p,k+1,j) + p_(i-1)*p_k*p_j 6 if q < m[i,j] 7 m[i,j] = q 8 return m[i,j] We show RECURSIVE-MATRIX-CHAIN needs 15.3.7 Omega(2^n) time to compute m[1,n]. Letting T(n) be the time to compute m[1,n], we have: T(1) >= 1 n-1 T(n) >= 1 + Sum(T(k) + T(n-k) + 1) for n > 1 k=1 Note that each T(i) occurs twice in the sum, once as T(i) and once as T(n - (n-i)), and 1 appears n times altogether, so we have: n-1 T(n) >= n + 2*Sum T(i) i=1 Now we prove T(n) >= 2^(n-1) by induction. Certainly T(1) = 1 = 2^(1-1), so for n > 1 n-1 T(n) >= n + 2*Sum 2^(i-1) i=1 n-2 = n + 2*Sum 2^i i=0 = n + 2(2^(n-1) - 1) = n + 2^n - 2 >= 2^(n-1) Thus T(n) = Omega(2^n) 15.3.8 The bottom-up dynamic programming solution is more efficient because it takes advantage of single solutions to the Theta(n^2) different overlapping subproblems. The recursive algorithm repeatedly solves the same problem each time it occurs in the recursion tree. So whenever the same subproblem occurs repeatedly in the recursion tree and the total number of different subproblems is small, there may be a dynamic programming solution. Reconstructing an optimal solution It may be possible to obtain the optimal choices (Step 4) from the optimal costs in Step 3, but it usually we store the choice we made in a table as we go along. In the chain- matrix case, using the table s[i,j] tells us how to choose in Theta(1) time, whereas if we didn't have s[i,j] we would have to examine the j-i possibilities for parenthesizing A_iA_(i+1)...A_j, which is Theta(j-i). Memoization 15.3.9 It is possible to make the natural recursive solution to a dynamic programming problem as efficient as the dynamic programming solution by memoizing it. The idea is to maintain a table as usual and to compute its entries the first time a subproblem is encountered, but to just look up the result in subsequent times. Here is the memoized RECURSIVE-MATRIX-CHAIN(): MEMOIZED-MATRIX-CHAIN(p) 1 n = p.length - 1 2 let m[1..n,1..n] be a new table 3 for i = 1 to n 4 for j = i to n 5 m[i,j] = infinity 6 return LOOKUP-CHAIN(m,p,1,n) LOOKUP-CHAIN(m,p,i,j) 1 if m[i,j] < infinity 2 return m[i,j] 3 if i == j 4 m[i,j] = 0 5 else for k = i to j-1 6 q = LOOKUP-CHAIN(m,p,i,k) + LOOKUP-CHAIN(m,p,k+1,j) + p_(i-1)*p_k*p_j 7 if q < m[i,j] 8 m[i,j] = q 9 return m[i,j] 15.3.10 Figure 15.7 shows how MEMOIZED-MATRIX-CHAIN saves time compared to RECURSIVE-MATRIX-CHAIN. Shaded subtrees are values that are looked up rather than computed. There are two kinds of calls to LOOKUP-CHAIN: 1. if m[i,j] = infinity, lines 3-9 are done 2. if m[i,j] < infinity, line 2 returns m[i,j] There are Theta(n^2) calls of the first type, one per table entry. And for each entry in the table there are O(n) lookups from line 2, for a total of O(n^3) time (really Theta(n^3)) which is the same asymptotically as the dynamic programming solution. So memoization converts an Omega(2^n) recursive algorithm to a O(n^3) algorithm. In general, if all the subproblems must be solved at least once, a dynamic programming solution beats a memoized recursive solution by a constant factor since it is simpler and doesn't have the overhead of recursive calls. And there are some problems for which the time or space requirements of the dynamic programming solution can be further reduced. On the other hand if not all subproblems need be computed, the memoized algorithm saves time by not computing them.