Chapter 8 Sorting in Linear Time 8.1.1
Merge-sort and heap-sort can sort n numbers
in O(n lg n) time, and quicksort can do it on
average; moreover for each of these algorithms
we can find n numbers that take Omega(n lg n)
time. These algorithms are called comparison
sorts since they sort by making comparisons
between numbers.
Section 8.1 proves that any comparison sort
must make Omega(n lg n) comparisons in the
worst case, so Merge-sort and heap-sort are
asymptotically optimal.
Sections 8.2, 8.3, and 8.4 discuss counting
sort, radix sort, and bucket sort, which run
in linear time, and so must use operations
other than comparison to do the sorting.
8.1 Lower bounds for sorting
In a comparison sort, we use one of the
comparisons <, <=, =, >=, or > to test the
relation between two elements a_i and a_j in
the input sequence . In
this section, we assume that all input numbers
are distinct, so we don't need =, and <, <=,
>=, and > all give equivalent information. So
we can assume all comparisons are <=.
The decision-tree model 8.1.2
We can view sorts in terms of decision trees:
full binary trees representing the comparisons
made by an algorithm on input of a given size.
Figure 8.1 shows the decision tree for the
insertion sort on three elements.
Each internal node is annotated by i:j where
1 <= i,j <= n, & n is the number of elements.
each leaf is annotated by a permutation
. The execution of
the algorithm corresponds to a path from the
root to a leaf. At an internal node marked by
i:j, if the comparison a_i <= a_j is true, we
go down the left subtree, otherwise we go down
the right subtree. When we come to a leaf,
the algorithm has established the ordering:
a_pi(1) <= a_pi(2) <= ... <= a_pi(n). Since
any correct algorithm must be able to produce
any of the n! permutations of n elements, each
of these permutations must be at leaves that
are reachable from the root. So we only
consider decision trees of this type.
A lower bound for the worst case
The length of the longest path from the root
to a reachable leaf represents the worst-case
number of comparisons that algorithm requires.
So the worst-case number of comparisons is the
height of the algorithm's decision tree.
Theorem 8.1 8.1.3
Any comparison sort algorithm requires
Omega(n lg n) comparisons in the worst case.
Proof: Consider a decision tree of height h
with r reachable leaves corresponding to a
comparison sort on n elements. Since each of
the n! permutations of the input appears as
some leaf, we have n! <= r. Since a binary
tree of height h has <= 2^h leaves, we have:
n! <= r <= 2^h
which, by taking logarithms, gives:
h >= lg(n!) ( lg increases monotonically )
= Theta(n lg n) (by equation 3.18 page 55)
which implies that
h = Omega(n lg n)
8.2 Counting sort 8.2.1
Counting sort assumes that each of the n
inputs is an integer in the range 0 to k. If
k = O(n), the sort runs in Theta(n) time. The
basic idea is: for each input x, determine the
number of elements less than x, then place x
directly in the output array. We modify this
a bit to handle elements with the same value.
We assume A[1..n] is the input array, and so
length[A] = n. B[1..n] will be the output
array, & C[0..k] is used for working storage.
The first & third loops run in time Theta(k),
the second & fourth loops run in Theta(n) time
for a total of Theta(n+k) = Theta(n) if k =
O(n). This algorithm is stable: numbers with
the same value stay in the same order as their
input order. Figure 8.2 shows counting sort.
COUNTING-SORT(A,B,k)
1 for i <- 0 to k
2 do C[i] <- 0
3 for j <- 1 to length[A]
4 do C[A[j]] <- C[A[j]] + 1
5 |> C[i] now contains the number of elements
equal to i
6 for i <- 1 to k
7 do C[i] <- C[i] + C[i - 1]
8 |> C[i] now contains the number of elements
less than or equal to i
9 for j <- length[A] downto 1
10 do B[C[A[j]]] <- A[j]
11 C[A[j]] <- C[A[j]] - 1
8.3 Radix sort 8.3.1
Radix sort was used by card-sorting machines.
The cards had 80 columns, each column of which
had 12 places to punch a hole. The cards
could be sorted into one of 12 bins according
to the hole in a particular column. The cards
could then be gathered up bin by bin, stacked
up and sorted again on another column. To
sort numbers, the radix or base used was 10.
A d-digit number would require d columns.
Counterintuitively, radix sorts on the least
significant digit first. To sort a collection
of d-digit numbers requires d passes through
the card sorter. Figure 8.3 page 171 shows
the radix sort of seven 3-digit numbers. On a
computer, it is essential that each digit sort
be stable. Radix sort is sometimes used to
sort data that has multiple "key" fields, such
as dates (day, month, year) and names (first,
middle, last). In the case of date, the sort
must start with the day, then month, and end
with the year. For names, do the middle name
first, then the first name, then the last.
RADIX-SORT below assumes that each element in
the n-element array A has d digits, with digit
1 being the least significant, and d the most.
RADIX-SORT(A,d)
1 for i <- 1 to d
2 do use a stable sort to sort A on digit i
Lemma 8.3 8.3.2
Given n d-digit numbers in base k, RADIX-SORT
correctly sorts them in O(d(n + k)) time.
Proof: RADIX-SORT can be proved correct by
induction on i (Exercise 8.3-3). Counting
sort can be used to sort the k values of the
i-th digit in O(n + k) time (and is stable),
so the total time would be O(d(n + k)).
So it runs in linear time when d is constant
and k = O(n). More generally, we have:
Lemma 8.4 Given n b-bit numbers & any positive
integer r <= b, RADIX-SORT correctly sorts the
numbers in Theta((b/r)(n + 2^r)) time.
Proof: For a value r <= b, we view each number
as having d = ceiling(b/r) digits of r bits
each. Each digit is an integer in the range
0 to 2^r - 1, so that we can use a counting
sort with k = 2^r - 1. Each pass of counting
sort takes Theta(n + k) = Theta(n + 2^r), for
Theta(d(n + 2^r)) = Theta((b/r)(n + 2^r))
total time for d passes. For example, if we
have b = 32 bit keys with 4 8-bit digits, then
r = 8, k = 2^r - 1 = 255, and d = b/r = 4.
If b < floor(lg n), choosing r = b yields a
running time of (b/b)(n + 2^b) = Theta(n).
If b >= floor(lg n), choosing r = floor(lg n)
gives the best running time of Theta(bn/lg n).
8.4 Bucket sort 8.4.1
Bucket sort, like counting sort, assumes that
the input is in a special form: n numbers
uniformly distributed over the interval [0,1).
It works by dividing [0,1) into n equal-sized
intervals or buckets, and then distributes the
n numbers into the buckets. Since the numbers
are uniformly distributed, we don't expect
many numbers to fall into each bucket. To get
the output, we sort the numbers in each bucket
and then go through the buckets in order,
listing the numbers in each one.
BUCKET-SORT assumes the input is an n-element
array A of real numbers, and 0 <= A[i] < 1 for
each i. It uses an auxiliary array B[0..n-1]
of linked lists (the buckets). Figure 8.4 on
page 175 shows how it sorts 10 input numbers.
BUCKET-SORT(A)
1 n <- length[A]
2 for i <- 1 to n
3 do insert A[i] into list B[floor(nA[i])]
4 for i <- 0 to n - 1
5 do sort list B[i] with insertion sort
6 concatenate the lists B[0], B[1],..., B[n-1]
together in order
To analyze the running time, first note that
all lines except line 5 take Theta(n).
8.4.2
To analyze the cost of the insertion sorts,
let n_i denote the number of elements placed
in bucket B[i]. Since insertion sort runs in
quadratic time, the total running time is:
n-1
T(n) = Theta(n) + Sum ( O( (n_i)^2 )
i = 0
Taking expectations of both sides and using
linearity of expectation, we have
n-1
E[T(n)] = E[ Theta(n) + Sum ( O( (n_i)^2 ) ]
i = 0
n-1
= Theta(n) + Sum ( E[ O( (n_i)^2 ) ]
i = 0
n-1
= Theta(n) + Sum ( O(E[ (n_i)^2 ]) )
i = 0
We next show for i = 0, 1,..., n-1 that
E[ (n_i)^2 ] = 2 - 1/n.
This will show the total expected running time
is Theta(n) + n x O(2 - 1/n) = Theta(n).
Proof that E[ (n_i)^2 ] = 2 - 1/n 8.4.3
Define indicator random variables
X_ij = I{A[j] falls in bucket i}
for i = 0, 1,..., n-1 & j = 1, 2,..., n
n
Thus: n_i = Sum X_ij
j = 1
To compute E[ (n_i)^2 ], expand the square
and regroup terms:
n
E[(n_i)^2] = E[(Sum X_ij)^2]
j=1
n n
= E[ Sum Sum X_ij X_ik ]
j=1 k=1
n n n
= E[Sum(X_ij)^2 + Sum Sum X_ij X_ik]
j=1 j=1 k=1
k != j
n n n
= Sum E[(X_ij)^2] + Sum Sum E[X_ij X_ik]
j=1 j=1 k=1
k != j
by linearity of E[]. Now we compute the E[]'s
X_ij is 1 with probability 1/n & 0 otherwise:
E[(X_ij)^2] = E[x_ij] = 1 * 1/n + 0 * (1 -1/n)
= 1/n
8.4.4
If k != j, X_ij and X_ik are independent, so
E[X_ij X_ik] = E[X_ij]*[X_ik]
= (1/n)*(1/n)
= 1/n^2
Putting these E[]'s into the equation above,
n n n
E[(n_i)^2] = Sum 1/n + Sum Sum 1/n^2
j=1 j=1 k=1
k != j
= n*(1/n) + n(n-1)*(1/n^2)
= 1 + (n-1)/n
= 2 - 1/n
as was to be proved.