Chapter 16 Greedy Algorithms 16.0.1
Each step of an optimization algorithm often
has a set of options. We _could_ examine each
one, as in dynamic programming. But for some
problems, always picking the choice that looks
best at the moment gives the right answer.
Def. A greedy algorithm is one that always
makes the choice seeming best at the moment.
Not all optimization problems can be solved
by greedy algorithms of course.
Section 16.1 presents the activity-selection
problem, which can be solved by a greedy
algorithm. This is done by first finding a
dynamic programming solution, and then showing
that the greedy choice works.
Section 16.2 reviews the theory of greedy
algorithms, including proof of correctness.
Section 16.3 treats an application of greedy
techniques: Huffman codes. Section 16.4 deals
with matroids, for which greedy algorithms
always produce an optimal solution. Section
16.5 presents an application of matroids.
The minimum-spanning-tree algorithms (Kruskal
and Prim) of Chapter 23, Dijkstra's single-
source shortest-path algorithm, and Chvatal's
greedy set-covering heuristic (Chapter 35) are
all examples of greedy algorithms.
16.1 An activity-selection problem 16.1.1
Problem: we have a set S = {a_1,a_2,...,a_n}
of activities that wish to use a resource that
can only be used by one at a time. Each
activity has a start time s_i and a finish
time f_i, with 0 <= s_i < f_i < infinity. Two
activities a_i and a_j are compatible if
f_i <= s_j or f_j <= s_i. The problem: find
a maximum-sized subset of mutually compatible
activities. It will be an advantage to order
the activities by increasing finish times:
f_0 <= f_1 <= f_2 <= ... <= f_n < f_(n+1) (1)
Here is an example
i | 1 2 3 4 5 6 7 8 9 10 11
----+---------------------------------
s_i | 1 3 0 5 3 5 6 8 8 2 12
f_i | 4 5 6 7 8 9 10 11 12 13 14
The set {a_3,a_9,a_11} is mutually compatible
but not maximal; {a_1,a_4,a_8,a_11} is such a
maximal mutually compatible set - and so is
{a_2,a_4,a_9,a_11}.
We solve the problem in several steps. First
we think of dynamic programming in which we
combine subproblems to find a solution. But
we only have to make one choice, the greedy
one, leaving only one remaining subproblem.
This allows us to design a recursive solution,
which we convert to an iterative solution.
This illustrates the relation between dynamic
programming a greedy algorithms.
The optimal substructure of the 16.1.2
activity-selection problem
In dynamic programming we first find the
optimal substructure of an optimal solution.
Define the following sets:
S_ij = {a_k in S : f_i <= s_k < f_k <= s_j }
A_ij = a maximal set of compatible activities
in S_ij
i.e. S_ij is the subset of activities
that start after a_i and finish before a_j.
Suppose A_ij contains a_k, and let
A_ik = A_ij intersect S_ik and
A_kj = A_ij intersect S_kj, and thus
A_ij = A_ik U {a_k} U A_kj, and so the
maximal set of activities in S_ij consists of
|A_ij| = |A_ik| + 1 + |A_kj| activities.
The usual cut-and-paste argument shows that
A_ik and A_kj must be maximal solutions for
S_ik and S_kj: if A_ik' were a larger set of
compatible activities from S_ik then
|A_ik'| + 1 + |A_kj| > |A_ik| + 1 + |A_kj|,
contradicting the maximality of A_ij; a
a similar argument applies to A_kj.
This characterization of optimal 16.1.3
substructure suggests the use of dynamic pro-
gramming. Let c[i,j] be the size of an optimal
subset of S_ij, which gives the recurrence:
c[i,j] = c[i,k] + c[k,j] + 1
Of course we have to check all possibilities
for a_k to obtain a maximal solution, i.e.:
/ 0 if S_ij = phi
c[i,j] = < max{c[i,k]+c[k,j]+1} if S_ij !=phi
\a_k in S_ij
From this we could write a recursive algorithm
and memoize it or write a bottom-up solution.
But we can do better by using a greedy choice.
Making the greedy choice
We could save having to consider all choices
in the formula above if we could choose an
activity without first having to solve all the
subproblems. In fact we only need consider
one choice for an activity: the greedy choice.
For this problem, a greedy choice is the
activity that finishes first. Since the
activities are ordered by increasing finish
times, the first greedy choice is a_1.
After chosing a_1, we only have to solve the
subproblem of finding a maximal subset of
activities starting after f_1 (and there are
not any activities that end before s_1 since
f_1 is the first finish time).
Let S_k = {a_i in S : s_i >= f_k}. Then if
we choose a_1, S_1 is the only subproblem left
to solve, so an optimal solution consists of
a_1 and an optimal solution to S_1. Is this
correct? The following theorem says it is.
Theorem 16.1 16.1.4
Let S_k be non-empty and a_m be an activity
in S_k with earliest finish time. Then a_m is
used in _some_ maximal subset of compatible
activities of S_k.
Proof: Let A_k be a maximal subset of
compatible activities in S_k, including a_j
with earliest finish time. If a_j = a_m, we
are done; else let A_k' = A_k - {a_i} U {a_m}.
The activities in A_k' are disjoint since the
activities in A_k are, a_j is first to finish,
and f_m <= f_j. Since |A_k'| = |A_k|, A_k' is
a maximal subset of compatible activities of
S_k, and it includes A_m.
So even if we could use dynamic programming to
solve the activity selection problem, we don't
need to (and we haven't proved it has the
overlapping subproblem property). Since the
activities are in order of increasing finish
times, we only have to consider each activity
once. Also, we can use a top-down algorithm
since in choosing a_m we don't have to know
the solution to S_k. Greedy algorithms
typically have this top-down structure.
A recursive greedy algorithm 16.1.5
The procedure RECURSIVE-ACTIVITY-SELECTOR has
as input arrays of start and finish times, the
index k that defines subproblem S_k, and the
size n of the original problem. It returns a
maximal subset of mutually compatible
activities in S_k. We assume the n activities
are ordered by increasing finish times (if not
we sort them in O(n lg n) time). To start, we
add a fictitious activity a_0 with f_0 = 0, so
S_0 = S (= the set of all activities). The
initial call to solve the entire problem is:
RECURSIVE-ACTIVITY-SELECTOR(s, f, 0, n).
RECURSIVE-ACTIVITY-SELECTOR( s, f, k, n )
1 m = k + 1
2 while m <= n and s[m] < f[k] // Find first
3 m = m + 1 // activity in S_k to finish
4 if m <= n
5 return {a_m} union
RECURSIVE-ACTIVITY-SELECTOR(s,f,m,n)
6 else return phi
Figure 16.1 shows a trace of the algorithm.
Since each activity is examined only once, the
run time is Theta(n) (unless we need to sort).
The initial call to solve the problem 16.1.6
is: RECURSIVE-ACTIVITY-SELECTOR(s, f, 0, n+1).
To analyze the run time, we note that over all
the recursive calls each activity is examined
exactly once in line 2, for a contribution of
Theta(n). If the activities are not initially
sorted, there is an added cost of O(n lg n).
Figure 16.1 shows a trace of the algorithm.
An iterative greedy algorithm
The algorithm is almost tail-recursive, so
it is not surprising that there is an easy
conversion to an iterative algorithm, which
accumulates a solution set A to problem S:
GREEDY-ACTIVITY-SELECTOR( s, f )
1 n = s.length
2 A = {a_1}
3 k = 1
4 for m = 2 to n do
5 if s[m] >= f[k] then
6 A = A union {a_m}
7 k = m
8 return A
As with the recursive version, the run time
is Theta(n) (if the activities are sorted).
Elements of the greedy strategy 16.2.1
A greedy algorithm gets an optimal solution
by making a sequence of choices. At each
point we make the choice that is best at that
time. In Sec. 16.1 we went through the steps:
1. Determine the optimal substructure.
2. Develop a recursive solution.
3. Prove that at each step one of the optimal
choices is the greedy choice.
4. Show there is only one non-empty subproblem
left after making the greedy choice.
5. Develop a recursive algorithm implementing
the greedy strategy.
6. Convert the recursive algorithm to an
iterative algorithm.
This process can be shortened as follows,
though there is almost always a less efficient
underlying dynamic programming solution.
1. Convert the problem to making a choice and
leaving one subproblem to solve.
2. Prove there is always an optimal solution
that makes the greedy choice.
3. Prove that having made a greedy choice, the
optimal solution to remaining subproblem can
be combined with the greedy choice to give
an optimal solution to the original problem.
Key ingredients of greedy algorithms: 16.2.2
(1) the greedy-choice property, and
(2) the optimal substructure property.
Greedy-choice property
The greedy-choice property states that we can
obtain a globally optimal solution by making a
locally optimal (greedy) choice. We can make
this choice without first solving sub-problems
- unlike the dynamic programming case in which
we usually need to know the solutions to sub-
problems _before_ making the choice so dynamic
programming algorithms are usually implemented
bottom-up. A greedy algorithm may depend on
choices made so far, but not on future choices
- solutions to subproblems not yet considered.
Thus greedy algorithms are usually implemented
in a top-down manner.
To use a greedy algorithm, as in Theorem 16.1
we must prove that a greedy choice can always
be included in an optimal solution and that it
leaves a smaller, similar problem to solve.
The greedy-choice property often produces
gains in efficiency by making it clear how to
solve the remaining subproblem. For example,
in the activity-selection problem, once we had
sorted the activities by finishing time, we
only had to examine each activity once. This
is typical of greedy algorithms: preprocessing
input or using an appropriate data structure
will allow us to make the greedy choice easily
and lead to an efficient algorithm.
Optimal substructure 16.2.3
A problem exhibits optimal substructure if an
optimal solution to the whole problems has
within it optimal solutions to subproblems.
This is a common property of problems solved
either by dynamic programming or by greedy
algorithms, as we have seen.
We can usually use a more direct approach to
optimal substructure in a greedy algorithm, in
that there is usually just one subproblem when
we have made our choice. Thus, to use a
greedy algorithm, we just need to argue that
the greedy choice for the original problem and
an optimal solution to the subproblem will
yield an optimal solution to the original
problem. This scheme implicitly uses
induction on the (size of the) subproblems to
prove that making the greedy choice at each
step produces a globally optimal solution.
Greedy versus dynamic programming
Since greedy and dynamic programming methods
both share the optimal substructure property,
we must be careful to not use the greedy
method when only dynamic programming applies,
and not use dynamic programming when we can
gain the efficiency of a greedy algorithm. We
look at 2 versions of an optimization problem:
maximizing the value of items in a knapsack.
The 0-1 knapsack problem: 16.2.4
A thief robbing a store finds n items, the ith
item is worth v_i dollars & weighs w_i pounds,
where v_i and w_i are integers. He wants to
maximize the value of the items he takes, but
he can only carry (an integer) W pounds.
The fractional knapsack problem:
The setup is the same, except the thief can
take fractions of items (rather than the 0-1,
all-or-nothing choice of the previous case).
In the 0-1 case you can think of the items as
being solid whereas in the fractional case you
can think of them as being powder.
Both problems have the optimal substructure
property. For the 0-1 problem, if we remove
an item, say item j, from the knapsack, the
remaining load must be the most valuable for
items weighing at most W - w_j pounds that the
thief can take from the n-1 items excluding
item j. For the fractional problem, if we
remove w pounds of item j, then the remaining
load must be the most valuable for items
weighing at most W - w pounds that the thief
can take from the n-1 original items plus
w_j - w pounds of item j.
We can use a greedy algorithm to solve the
fractional problem but not the 0-1 problem.
For the fractional problem, we compute the
value per pound v_i/w_i and sort these ratios
in decreasing order. Then we make the greedy
16.2.5
choice as follows: at each step we pour as
much of the most valuable-per-pound item yet
remaining into the knapsack as will fit. It
turns out that this strategy satisfies the
greedy-choice property (Exercise 16.2-1).
This greedy strategy doesn't work for the 0-1
problem, as shown in Figure 16.2. The greedy
strategy would tell us to put the 10-pound
Item 1 into the knapsack first, after which
there would only be room for the 20-pound Item
2 or the 30-pound Item 3, both of which lead
to suboptimal solutions. The best solution is
to skip Item 1 and put Items 2 and 3 in the
knapsack, as shown in Figure 16.2 (b).
For the fractional case, we put all of Items
1 and 2 in the knapsack, and then as much of
Item 3 as will fit, which yields an optimal
solution, as shown in Figure 16.2 (c). The
reason that using Item 1 doesn't work in the
0-1 case is that it leads to unused space in
the knapsack, lowering the overall value-per-
pound. To solve the 0-1 problem, we must
compare the maximal solution that includes an
item with the maximal solution that does not
include it. This strategy gives rise to many
overlapping subproblems, a hallmark of dynamic
programming, which can be used to solve the
0-1 problem as indicated in Exercise 16.2-2.
Huffman codes 16.3.1
David Huffman invented these codes in 1952.
These codes can compress data with savings of
20% to 90%. The data are considered to be
characters and the goal is to represent them
by a binary code. Huffman's greedy algorithm
uses a table of character frequencies to build
an optimal binary code representation of them.
For example if we have a 100,000-character
file made up of six characters, we might have
frequencies as given in Figure 16.3, in which
the character "a" occurs 45,000 times.
Among the ways of representing the characters
we consider a binary character codes (or just
code for short) in which there is a unique
binary string for each character. We could
use a fixed-length code, which would need 3
bits to represent six characters a,b,c,d,e,f,
requiring 300,000 bits for the whole file.
We can do better with a variable-length code
which gives frequent characters short codes
and infrequent characters longer codes. If we
use the variable-length codes in Figure 16.3,
we can save about 25.33%, reducing the file
length to 224,000 bits =
(45*1 + 13*3 + 12*3 + 16*3 + 9*4 + 5*4)*1000
Prefix codes 16.3.2
A code in which no codeword is a prefix of
any other codeword is called a prefix code.
We restrict attention to prefix codes which
can be proven to be optimal for compression.
Encoding using binary character codes is
simple: just concatenate the codewords. For
example, using the variable-length code of
Figure 16.3, abc is encoded as 0.101.100 =
0101100 (where "." denotes concatenation).
Prefix codes simplify decoding: there is a
unique codeword that is a prefix to the file,
which we replace by its character, and then
repeat the process. For example 00101101 is
0.0.101.1101 which decodes to aabe.
A convenient way to decode is to use a binary
tree with the characters in the leaves, and
internal nodes with 0 denoting a branch to the
left and 1 a branch to the right as in Figure
16.4. Note: this is not a binary search tree.
According to Exercise 16.3-1, an optimal code
is always represented by a full binary tree,
in which each nonleaf node has two children.
Thus the fixed code tree shown in Figure 16.4a
is not optimal. So now we restrict attention
to full binary trees representing the alphabet
C of characters (with positive frequency), so
there are |C| leaves and |C|-1 internal nodes.
If tree T represents the code, character c in
C has frequency c.freq in the file & d_T(c) is
the depth in T of leaf c, the number of bits =
B(T) = Sum ( c.freq * d_T(c) )
c in C
Constructing a Huffman code 16.3.3
We show the Huffman code creation algorithm &
then show that it satisfies the greedy-choice
property and has optimal substructure.
As above, C is a set of n characters where
each character c in C has a frequency c.freq.
The algorithm builds the tree T corresponding
to the optimal code in a bottom-up manner,
starting with |C| leaves and performing |C|-1
"merge"s. It uses a min-priority queue Q to
identify the two least frequent subtrees to
merge. The "frequency" of the merged node is
the sum of the frequencies of its children.
Figure 16.5 shows a run with the frequencies
as in Figure 16.3. The codeword for a letter
is the sequence of edge labels from the root.
If we assume Q is implemented as a binary
min-heap, we can build the heap in O(n) time,
and EXTRACT-MIN is called 2(n-1) times at a
cost of O(lg n) for a total cost of O(n lg n).
HUFFMAN(C)
1 n = |C|
2 Q = C
3 for i = 1 to n-1
4 allocate new node z
5 z.left = x = EXTRACT-MIN(Q)
6 z.right = y = EXTRACT-MIN(Q)
7 z.freq = x.freq + y.freq
8 INSERT(Q,z)
9 return EXTRACT-MIN(Q) // Return root of T
Correctness of Huffman's algorithm 16.3.4
To prove correctness of Huffman's algorithm,
we first show in Lemma 16.2 that it has the
greedy-choice property, and then in Lemma 16.3
that it has the optimal substructure property.
Lemma 16.2 Let C be as above, and let x and y
be two characters with the lowest frequency.
Then there exists an optimal prefix code for
which the codewords for x and y have the same
length and differ only in the last bit.
Proof: We start with a tree T representing an
arbitrary optimal prefix code and modify it to
another tree with optimal prefix code in which
x and y appear as sibling leaves of maximum
depth in the new tree. Doing this will prove
the lemma. And this is a greedy choice.
Lemma 16.3 Let C and x and y be as in Lemma
16.2. Let C' be the new alphabet with x and y
removed & a new character z added with z.freq
= x.freq + y.freq. Let T' be any tree with
an optimal prefix code for C'. The tree T,
obtained from T' by replacing the leaf node
for z with an internal node having x and y as
children, is an optimal prefix tree for C.
Theorem 16.4 Procedure HUFFMAN produces an
optimal prefix code.
Proof: Immediate from Lemmas 16.2 and 16.3.