Chapter 16 Greedy Algorithms 16.0.1
Each step of an optimization algorithm often
has a set of options. We _could_ examine each
one, as in dynamic programming. But for some
problems, always picking the choice that looks
best at the moment gives the right answer.
Def. A greedy algorithm is one that always
makes the choice seeming best at the moment.
Not all optimization problems can be solved
by greedy algorithms of course.
Section 16.1 presents the activity-selection
problem, which can be solved by a greedy
algorithm. This is done by first finding a
dynamic programming solution, and then showing
that the greedy choice works.
Section 16.2 reviews the theory of greedy
algorithms, including proof of correctness.
Section 16.3 treats an application of greedy
techniques: Huffman codes. Section 16.4 deals
with matroids, for which greedy algorithms
always produce an optimal solution. Section
16.5 presents an application of matroids.
The minimum-spanning-tree algorithms (Kruskal
and Prim) of Chapter 23, Dijkstra's single-
source shortest-path algorithm, and Chvatal's
greedy set-covering heuristic (Chapter 35) are
all examples of greedy algorithms.
16.1 An activity-selection problem 16.1.1
Problem: we have a set S = {a_1,a_2,...,a_n}
of activities that wish to use a resource that
can only be used by one at a time. Each
activity has a start time s_i and a finish
time f_i, with 0 <= s_i < f_i < infinity. Two
activities a_i and a_j are compatible if
f_i <= s_j or f_j <= s_i. The problem: find
a maximum-sized subset of mutually compatible
activities. Here is an example
i | 1 2 3 4 5 6 7 8 9 10 11
----+---------------------------------
s_i | 1 3 0 5 3 5 6 8 8 2 12
f_i | 4 5 6 7 8 9 10 11 12 13 14
The set {a_3,a_9,a_11} is mutually compatible
but not maximal; {a_1,a_4,a_8,a_11} is such a
maximal mutually compatible set - and so is
{a_2,a_4,a_9,a_11}.
We solve the problem in several steps. First
we use dynamic programming to find a solution
in which we combine two subproblems. Of the
choices of subproblems, we find that we only
have to consider one, the greedy one, and the
other one will be empty. This allows us to
design a recursive solution, which we then
convert to an iterative solution. This
illustrates the relation between dynamic
programming a greedy algorithms.
The optimal substructure of the 16.1.2
activity-selection problem
In dynamic programming we first find the
optimal substructure of an optimal solution.
Define the following sets:
S_ij = {a_k in S : f_i <= s_k < f_k <= s_j }
i.e. S_ij is the subset of activities that can
start after a_i and finish before a_j. To
avoid special cases, introduce fictitious
activities a_0 and a_(n+1) with the convention
f_0 = 0 and s_(n+1) = infinity, so S_0,(n+1) =
S as a special case. We further order the
activities by increasing finish time:
f_0 <= f_1 <= f_2 <= ... <= f_n < f_(n+1) (1)
It is easy to see that S_ij = phi if i >= j,
otherwise f_i <= s_k < f_k <= s_j < f_j which
contradicts the assumption that f_i >= f_j.
Suppose we want to solve the subproblem S_ij
and we choose activity a_k in S_ij, then the
solution is the number of activities in S_ik
plus the number of activities in S_kj plus 1
(to count a_k).
The optimal substructure of the S_ij problem
is as follows. Suppose an optimal solution
A_ij includes a_k, then the solutions A_ik and
A_kj used within A_ij must also be optimal -
otherwise we could replace either of them by a
larger solution, which would lead to a larger
solution than A_ij of the S_ij problem.
16.1.3
We show how to construct an optimal solution
from optimal solutions to subproblems. Any
solution to a non-empty problem S_ij contains
some a_k and optimal solutions to S_ik and
S_kj, A_ik and A_kj. Thus an optimal solution
is: A_ij = A_ik union {a_k} union A_kj (2)
A recursive solution
For Step 2 of dynamic programming, we let
c[i,j] = the number of activities in a maximal
subset of mutually compatible activities in
S_ij. We have c[i,j] = 0 when S_ij = phi; in
particular for i >= j. So by equation (2)
c[i,j] = c[i,k] + c[k,j] + 1
which assumes we know the value of k, but we
don't, so we have to examine the j - i - 1
possible values of k, k = i+1, i+2, ..., j-1.
So the full recursive definition of c[i,j] is:
/ 0 if S_ij = phi
c[i,j] = < max{c[i,k]+c[k,j]+1} if S_ij !=phi
\i 2); the third m is
the first activity in S_(m_2),(n+1), etc.
Note that the finish times will be strictly
increasing as we proceed. In fact we only
need to consider each activity once, in
increasing order of finish times.
Also, since a_m was chosen with earliest
possible finish time among the possibilities,
this leaves the maximum amount of time for the
remaining activities so it is a greedy choice.
A recursive greedy algorithm
The procedure RECURSIVE-ACTIVITY-SELECTOR has
as input arrays of start and finish times and
the indices i and j of the subproblem S_ij to
solve. It returns A_ij, a maximal subset of
S_ij of mutually compatible activities. We
assume the n activities are ordered by
increasing finish times as in Equation (1).
RECURSIVE-ACTIVITY-SELECTOR( s, f, i, j )
1 m <- i + 1
2 while m < j and s_m < f_i |> Find first
3 do m <- m + 1 |> activity in S_ij
4 if m < j
5 then return {a_m} union
RECURSIVE-ACTIVITY-SELECTOR(s,f,m,j)
6 else return phi
The initial call to solve the problem 16.1.6
is: RECURSIVE-ACTIVITY-SELECTOR(s, f, 0, n+1).
To analyze the run time, we note that over all
the recursive calls each activity is examined
exactly once in line 2, for a contribution of
Theta(n). If the activities are not initially
sorted, there is an added cost of O(n lg n).
Figure 16.1 shows a trace of the algorithm.
An iterative greedy algorithm
The algorithm is almost tail-recursive, so
it is not surprising that there is an easy
conversion to an iterative algorithm, which
accumulates a solution set A to problem S:
GREEDY-ACTIVITY-SELECTOR( s, f )
1 n <- length[s]
2 A <- {a_1}
3 i <- 1
4 for m <- 2 to n do
5 if s_m >= f_i then
6 A <- A union {a_m}
7 i <- m
8 return A
As with the recursive version, the run time
is Theta(n) (if the activities are sorted).
Elements of the greedy strategy 16.2.1
A greedy algorithm gets an optimal solution
by making a sequence of choices. At each
point we make the choice that is best at that
time. In Sec. 16.1 we went through the steps:
1. Determine the optimal substructure.
2. Develop a recursive solution.
3. Prove that at each step one of the optimal
choices is the greedy choice.
4. Show there is only one non-empty subproblem
left after making the greedy choice.
5. Develop a recursive algorithm implementing
the greedy strategy.
6. Convert the recursive algorithm to an
iterative algorithm.
This process can be shortened as follows,
though there is almost always a less efficient
underlying dynamic programming solution.
1. Convert the problem to making a choice and
leaving one subproblem to solve.
2. Prove there is always an optimal solution
that makes the greedy choice.
3. Prove that having made a greedy choice, the
optimal solution to remaining subproblem can
be combined with the greedy choice to give
an optimal solution to the original problem.
Key ingredients of greedy algorithms: 16.2.2
(1) the greedy-choice property, and
(2) the optimal substructure property.
Greedy-choice property
The greedy-choice property states that a
globally optimal solution can be obtained by
making the locally optimal (greedy) choice.
This choice is made without first solving sub-
problems, unlike the dynamic programming case,
in which we usually need to know the solutions
to subproblems _before_ making the choice.
Thus dynamic programming algorithms are
usually implemented bottom-up. A greedy
algorithm may depend on choices made so far,
but it cannot depend on future choices (i.e.
solutions to subproblems not yet considered).
Thus greedy algorithms are usually implemented
in a top-down manner.
To use a greedy algorithm, as in Theorem 16.1
we must prove that a greedy choice can always
be included in an optimal solution and that it
leaves a smaller, similar problem to solve.
The greedy-choice property often produces
gains in efficiency by making it clear how to
solve the remaining subproblem. For example,
in the activity-selection problem, once we had
sorted the activities by finishing time, we
only had to examine each activity once. This
is typical of greedy algorithms: preprocessing
input or using an appropriate data structure
will allow us to make the greedy choice easily
and lead to an efficient algorithm.
Optimal substructure 16.2.3
A problem exhibits optimal substructure if an
optimal solution to the whole problems has
within it optimal solutions to subproblems.
This is a common property of problems solved
either by dynamic programming or by greedy
algorithms, as we have seen.
We can usually use a more direct approach to
optimal substructure in a greedy algorithm, in
that there is usually just one subproblem when
we have made our choice. Thus, to use a
greedy algorithm, we just need to argue that
the greedy choice for the original problem and
an optimal solution to the subproblem will
yield an optimal solution to the original
problem. This scheme implicitly uses
induction on the (size of the) subproblems to
prove that making the greedy choice at each
step produces a globally optimal solution.
Greedy versus dynamic programming
Since greedy and dynamic programming methods
both share the optimal substructure property,
we must be careful to not use the greedy
method when only dynamic programming applies,
and not use dynamic programming when we can
gain the efficiency of a greedy algorithm. We
look at 2 versions of an optimization problem:
maximizing the value of items in a knapsack.
The 0-1 knapsack problem: 16.2.4
A thief robbing a store finds n items, the ith
item is worth v_i dollars & weighs w_i pounds,
where v_i and w_i are integers. The thief
wants to maximize the value of what he takes,
but he can only carry (an integer) W pounds.
Which items should he take?
The fractional knapsack problem:
The setup is the same, except the thief can
take fractions of items (rather than the 0-1,
all-or-nothing choice of the previous case).
In the 0-1 case you can think of the items as
being solid whereas in the fractional case you
can think of them as being powder.
Both problems have the optimal substructure
property. For the 0-1 problem, if we remove
an item, say item j, from the knapsack, the
remaining load must be the most valuable for
items weighing at most W - w_j pounds that the
thief can take from the n-1 items excluding
item j. For the fractional problem, if we
remove w pounds of item j, then the remaining
load must be the most valuable for items
weighing at most W - w pounds that the thief
can take from the n-1 original items plus
w_j - w pounds of item j.
We can use a greedy algorithm to solve the
fractional problem but not the 0-1 problem.
For the fractional problem, we compute the
value per pound v_i/w_i and sort these ratios
in decreasing order. Then we make the greedy
16.2.5
choice as follows: at each step we pour as
much of the most valuable-per-pound item yet
remaining into the knapsack as will fit. It
turns out that this strategy satisfies the
greedy-choice property (Exercise 16.2-1).
This greedy strategy doesn't work for the 0-1
problem, as shown in Figure 16.2. The greedy
strategy would tell us to put the 10-pound
Item 1 into the knapsack first, after which
there would only be room for the 20-pound Item
2 or the 30-pound Item 3, both of which lead
to suboptimal solutions. The best solution is
to skip Item 1 and put Items 2 and 3 in the
knapsack, as shown in Figure 16.2 (b).
For the fractional case, we put all of Items
1 and 2 in the knapsack, and then as much of
Item 3 as will fit, which yields an optimal
solution, as shown in Figure 16.2 (c). The
reason that using Item 1 doesn't work in the
0-1 case is that it leads to unused space in
the knapsack, lowering the overall value-per-
pound. To solve the 0-1 problem, we must
compare the maximal solution that includes an
item with the maximal solution that does not
include it. This strategy gives rise to many
overlapping subproblems, a hallmark of dynamic
programming, which can be used to solve the
0-1 problem as indicated in Exercise 16.2-2.
Huffman codes 16.3.1
David Huffman invented these codes in 1952.
These codes can compress data with savings of
20% to 90%. The data are considered to be
characters and the goal is to represent them
by a binary code. Huffman's greedy algorithm
uses a table of character frequencies to build
an optimal binary code representation of them.
For example if we have a 100,000-character
file made up of six characters, we might have
frequencies as given in Figure 16.3, in which
the character "a" occurs 45,000 times.
Among the ways of representing the characters
we consider a binary character codes (or just
code for short) in which there is a unique
binary string for each character. We could
use a fixed-length code, which would need 3
bits to represent six characters a,b,c,d,e,f,
requiring 300,000 bits for the whole file.
We can do better with a variable-length code
which gives frequent characters short codes
and infrequent characters longer codes. If we
use the variable-length codes in Figure 16.3,
we can reduce the file length to 224,000 =
(45*1 + 13*3 + 12*3 + 16*3 + 9*4 + 5*4)*1000
bits, a saving of about 25.33%
Prefix codes 16.3.2
A code in which no codeword is a prefix of
any other codeword is called a prefix code.
We restrict attention to prefix codes which
can be proven to be optimal for compression.
Encoding using binary character codes is
simple: just concatenate the codewords. For
example, using the variable-length code of
Figure 16.3, abc is encoded as 0.101.100 =
0101100 (where "." denotes concatenation).
Prefix codes simplify decoding: there is a
unique codeword that is a prefix to the file,
which we replace by its character, and then
repeat the process. For example 00101101 is
0.0.101.1101 which decodes to aabe.
A convenient way to decode is to use a binary
tree with the characters in the leaves, and
internal nodes with 0 denoting a branch to the
left and 1 a branch to the right as in Figure
16.4. Note: this is not a binary search tree.
According to Exercise 16.3-1, an optimal code
is always represented by a full binary tree,
in which each nonleaf node has two children.
Thus the fixed code tree shown in Figure 16.4a
is not optimal. So now we restrict attention
to full binary trees representing the alphabet
C of characters (with positive frequency), so
there are |C| leaves and |C|-1 internal nodes.
If tree T represents the code, character c in
C has frequency f(c) in the file, & d_T(c) is
the depth in T of leaf c, the number of bits =
B(T) = Sum f(c)*d_T(c)
c in C
Constructing a Huffman code 16.3.3
We first show the algorithm for building a
Huffman code and then later show that it
satisfies the greedy-choice property and has
optimal substructure.
As above, C is a set of n characters where
each character c in C has a frequency f[c].
The algorithm builds the tree T corresponding
to the optimal code in a bottom-up manner,
starting with |C| leaves and performing |C|-1
"merge"s. It uses a min-priority queue Q to
identify the two least frequent subtrees to
merge. The "frequency" of the merged node is
the sum of the frequencies of its children.
HUFFMAN(C)
1 n <- |C|
2 Q <- C
3 for i <- 1 to n-1 do
4 allocate new node z
5 left[z] <- x <- EXTRACT-MIN(Q)
6 right[z] <- y <- EXTRACT-MIN(Q)
7 f[z] <- f[x] + f[y]
8 INSERT(Q,z)
9 return EXTRACT-MIN(Q) |> Return root of T
Figure 16.5 shows a run with the frequencies
as in Figure 16.3. The codeword for a letter
is the sequence of edge labels from the root.
If we assume Q is implemented as a binary
min-heap, we can build the heap in O(n) time,
and EXTRACT-MIN is called 2(n-1) times at a
cost of O(lg n) for a total cost of O(n lg n).
Correctness of Huffman's algorithm 16.3.4
To prove correctness of Huffman's algorithm,
we first show in Lemma 16.2 below that it has
the greedy-choice property, and then in Lemma
16.3 that it has the optimal substructure
property.
Lemma 16.2 Let C be as above, and let x and y
be two characters with the lowest frequency.
Then there exists an optimal prefix code for
which the codewords for x and y have the same
length and differ only in the last bit.
Proof: We start with a tree T representing an
arbitrary optimal prefix code and modify it to
another tree with optimal prefix code in which
x and y appear as sibling leaves of maximum
depth in the new tree. Doing this will prove
the lemma. And this is a greedy choice.
Lemma 16.3 Let C and x and y be as in Lemma
16.2. Let C' be the new alphabet with x and y
removed and a new character z added with f[z]
= f[x] + f[y]. Let T' be any tree that
represents and optimal prefix code for C'.
Then the tree T, obtained from T' by replacing
the leaf node for z with an internal node
having x and y as children, represents an
optimal prefix tree for C.
Theorem 16.4 Procedure HUFFMAN produces an
optimal prefix code.
Proof: Immediate from Lemmas 16.2 and 16.3.
123456789 123456789 123456789 123456789 123456
123456789 123456789 123456789 123456789 123456