Chapter 8 Sorting in Linear Time 8.1.1 Merge-sort and heap-sort can sort n numbers in O(n lg n) time, and quicksort can do it on average; moreover for each of these algorithms we can find n numbers that take Omega(n lg n) time. These algorithms are called comparison sorts since they sort by making comparisons between numbers. Section 8.1 proves that any comparison sort must make Omega(n lg n) comparisons in the worst case, so Merge-sort and heap-sort are asymptotically optimal. Sections 8.2, 8.3, and 8.4 discuss counting sort, radix sort, and bucket sort, which run in linear time, and so must use operations other than comparison to do the sorting. 8.1 Lower bounds for sorting In a comparison sort, we use one of the comparisons <, <=, =, >=, or > to test the relation between two elements a_i and a_j in the input sequence . In this section, we assume that all input numbers are distinct, so we don't need =, and <, <=, >=, and > all give equivalent information. So we can assume all comparisons are <=. The decision-tree model 8.1.2 We can view sorts in terms of decision trees: full binary trees representing the comparisons made by an algorithm on input of a given size. Figure 8.1 shows the decision tree for the insertion sort on three elements. Each internal node is annotated by i:j where 1 <= i,j <= n, & n is the number of elements. each leaf is annotated by a permutation . The execution of the algorithm corresponds to a path from the root to a leaf. At an internal node marked by i:j, if the comparison a_i <= a_j is true, we go down the left subtree, otherwise we go down the right subtree. When we come to a leaf, the algorithm has established the ordering: a_pi(1) <= a_pi(2) <= ... <= a_pi(n). Since any correct algorithm must be able to produce any of the n! permutations of n elements, each of these permutations must be at leaves that are reachable from the root. So we only consider decision trees of this type. A lower bound for the worst case The length of the longest path from the root to a reachable leaf represents the worst-case number of comparisons that algorithm requires. So the worst-case number of comparisons is the height of the algorithm's decision tree. Theorem 8.1 8.1.3 Any comparison sort algorithm requires Omega(n lg n) comparisons in the worst case. Proof: Consider a decision tree of height h with r reachable leaves corresponding to a comparison sort on n elements. Since each of the n! permutations of the input appears as some leaf, we have n! <= r. Since a binary tree of height h has <= 2^h leaves, we have: n! <= r <= 2^h which, by taking logarithms, gives: h >= lg(n!) ( lg increases monotonically ) = Theta(n lg n) (by equation 3.18 page 55) which implies that h = Omega(n lg n) 8.2 Counting sort 8.2.1 Counting sort assumes that each of the n inputs is an integer in the range 0 to k. If k = O(n), the sort runs in Theta(n) time. The basic idea is: for each input x, determine the number of elements less than x, then place x directly in the output array. We modify this a bit to handle elements with the same value. We assume A[1..n] is the input array, and so length[A] = n. B[1..n] will be the output array, & C[0..k] is used for working storage. The first & third loops run in time Theta(k), the second & fourth loops run in Theta(n) time for a total of Theta(n+k) = Theta(n) if k = O(n). This algorithm is stable: numbers with the same value stay in the same order as their input order. Figure 8.2 shows counting sort. COUNTING-SORT(A,B,k) 1 for i <- 0 to k 2 do C[i] <- 0 3 for j <- 1 to length[A] 4 do C[A[j]] <- C[A[j]] + 1 5 |> C[i] now contains the number of elements equal to i 6 for i <- 1 to k 7 do C[i] <- C[i] + C[i - 1] 8 |> C[i] now contains the number of elements less than or equal to i 9 for j <- length[A] downto 1 10 do B[C[A[j]]] <- A[j] 11 C[A[j]] <- C[A[j]] - 1 8.3 Radix sort 8.3.1 Radix sort was used by card-sorting machines. The cards had 80 columns, each column of which had 12 places to punch a hole. The cards could be sorted into one of 12 bins according to the hole in a particular column. The cards could then be gathered up bin by bin, stacked up and sorted again on another column. To sort numbers, the radix or base used was 10. A d-digit number would require d columns. Counterintuitively, radix sorts on the least significant digit first. To sort a collection of d-digit numbers requires d passes through the card sorter. Figure 8.3 page 171 shows the radix sort of seven 3-digit numbers. On a computer, it is essential that each digit sort be stable. Radix sort is sometimes used to sort data that has multiple "key" fields, such as dates (day, month, year) and names (first, middle, last). In the case of date, the sort must start with the day, then month, and end with the year. For names, do the middle name first, then the first name, then the last. RADIX-SORT below assumes that each element in the n-element array A has d digits, with digit 1 being the least significant, and d the most. RADIX-SORT(A,d) 1 for i <- 1 to d 2 do use a stable sort to sort A on digit i Lemma 8.3 8.3.2 Given n d-digit numbers in base k, RADIX-SORT correctly sorts them in O(d(n + k)) time. Proof: RADIX-SORT can be proved correct by induction on i (Exercise 8.3-3). Counting sort can be used to sort the k values of the i-th digit in O(n + k) time (and is stable), so the total time would be O(d(n + k)). So it runs in linear time when d is constant and k = O(n). More generally, we have: Lemma 8.4 Given n b-bit numbers & any positive integer r <= b, RADIX-SORT correctly sorts the numbers in Theta((b/r)(n + 2^r)) time. Proof: For a value r <= b, we view each number as having d = ceiling(b/r) digits of r bits each. Each digit is an integer in the range 0 to 2^r - 1, so that we can use a counting sort with k = 2^r - 1. Each pass of counting sort takes Theta(n + k) = Theta(n + 2^r), for Theta(d(n + 2^r)) = Theta((b/r)(n + 2^r)) total time for d passes. For example, if we have b = 32 bit keys with 4 8-bit digits, then r = 8, k = 2^r - 1 = 255, and d = b/r = 4. If b < floor(lg n), choosing r = b yields a running time of (b/b)(n + 2^b) = Theta(n). If b >= floor(lg n), choosing r = floor(lg n) gives the best running time of Theta(bn/lg n). 8.4 Bucket sort 8.4.1 Bucket sort, like counting sort, assumes that the input is in a special form: n numbers uniformly distributed over the interval [0,1). It works by dividing [0,1) into n equal-sized intervals or buckets, and then distributes the n numbers into the buckets. Since the numbers are uniformly distributed, we don't expect many numbers to fall into each bucket. To get the output, we sort the numbers in each bucket and then go through the buckets in order, listing the numbers in each one. BUCKET-SORT assumes the input is an n-element array A of real numbers, and 0 <= A[i] < 1 for each i. It uses an auxiliary array B[0..n-1] of linked lists (the buckets). Figure 8.4 on page 175 shows how it sorts 10 input numbers. BUCKET-SORT(A) 1 n <- length[A] 2 for i <- 1 to n 3 do insert A[i] into list B[floor(nA[i])] 4 for i <- 0 to n - 1 5 do sort list B[i] with insertion sort 6 concatenate the lists B[0], B[1],..., B[n-1] together in order To analyze the running time, first note that all lines except line 5 take Theta(n). 8.4.2 To analyze the cost of the insertion sorts, let n_i denote the number of elements placed in bucket B[i]. Since insertion sort runs in quadratic time, the total running time is: n-1 T(n) = Theta(n) + Sum ( O( (n_i)^2 ) i = 0 Taking expectations of both sides and using linearity of expectation, we have n-1 E[T(n)] = E[ Theta(n) + Sum ( O( (n_i)^2 ) ] i = 0 n-1 = Theta(n) + Sum ( E[ O( (n_i)^2 ) ] i = 0 n-1 = Theta(n) + Sum ( O(E[ (n_i)^2 ]) ) i = 0 We next show for i = 0, 1,..., n-1 that E[ (n_i)^2 ] = 2 - 1/n. This will show the total expected running time is Theta(n) + n x O(2 - 1/n) = Theta(n). Proof that E[ (n_i)^2 ] = 2 - 1/n 8.4.3 Define indicator random variables X_ij = I{A[j] falls in bucket i} for i = 0, 1,..., n-1 & j = 1, 2,..., n n Thus: n_i = Sum X_ij j = 1 To compute E[ (n_i)^2 ], expand the square and regroup terms: n E[(n_i)^2] = E[(Sum X_ij)^2] j=1 n n = E[ Sum Sum X_ij X_ik ] j=1 k=1 n n n = E[Sum(X_ij)^2 + Sum Sum X_ij X_ik] j=1 j=1 k=1 k != j n n n = Sum E[(X_ij)^2] + Sum Sum E[X_ij X_ik] j=1 j=1 k=1 k != j by linearity of E[]. Now we compute the E[]'s X_ij is 1 with probability 1/n & 0 otherwise: E[(X_ij)^2] = E[x_ij] = 1 * 1/n + 0 * (1 -1/n) = 1/n 8.4.4 If k != j, X_ij and X_ik are independent, so E[X_ij X_ik] = E[X_ij]*[X_ik] = (1/n)*(1/n) = 1/n^2 Putting these E[]'s into the equation above, n n n E[(n_i)^2] = Sum 1/n + Sum Sum 1/n^2 j=1 j=1 k=1 k != j = n*(1/n) + n(n-1)*(1/n^2) = 1 + (n-1)/n = 2 - 1/n as was to be proved.