Part III Data Structures III.1 Introduction Sets are important in both in mathematics, where they are infinite and they don't change, and in computer science where they are always finite and can change in size, and are hence called "dynamic". These are the kinds of sets treated in Part III. Basic operations on sets include insertion, deletion, and membership query. A set that supports these operations is a "dictionary". Different operations may be required depending on the application. Elements of a dynamic set The sets consist of objects that may have a "key" value; if the keys are distinct, we may consider the set as consisting of the keys. The object may contain other values used by the set operations, such as pointers to other objects. The object may also have "satellite data" in an actual application, which is not used by the set operations, so we ignore it. The keys are often from a totally ordered set (e.g. real numbers or strings in alphabetical order). Then we can speak of the minimum or the next element. Operations on dynamic sets III.2 There are two kinds of operations: 1) queries which return information, and 2) modifying operations, which change the set. The first two operations below are modifying operations; the rest are queries. An application may only require some of them. INSERT(S,x) - adds the element pointed to by x to S. We assume that fields in x used by the set operations have been initialized. DELETE(S,x) - removes the element pointed to by x from S. If we want to "delete by key", k, we call SEARCH(S,k) first to obtain x. SEARCH(S,k) - returns a pointer to an element x in S with key[x] = k or NIL if none exits. MINIMUM(S) - returns a pointer to the element in a totally ordered set with smallest key. MAXIMUM(S) - returns a pointer to the element in a totally ordered set with largest key. SUCCESSOR(S,x) - returns a pointer to the next larger element in a totally ordered set S, or NIL, if x is the maximum element. PREDECESSOR(S,x) - returns a pointer to the next smaller element in a totally ordered set S, or NIL, if x is the minimum element. SUCCESSOR and PREDECESSOR are even used III.3 with sets with duplicate keys. For a set with n keys, calling MINIMUM, then SUCCESSOR n - 1 times will list the keys in order. The time taken to run an operations is measured in terms of n; all the operations above run in O(lg n) time for a red-black tree (Chap. 13). Overview of Part III In Chapter 6, we have seen heaps, which support INSERT, MAXIMUM, and a limited DELETE. Chapter 10 reviews stacks, queues, linked lists, and rooted trees; it also shows how to implement objects with pointers by arrays. Chapter 11 discusses hash tables, which support an O(1) SEARCH operation. Chapter 12 reviews binary search trees, whose operations take O(lg n) in the average case, but can become unbalanced in the worst case. Chapter 13 discusses red-black trees, which always remain balanced, so their operations take O(lg n) in the worst case. Chapter 14 discusses how red-black trees can be augmented in order to provide specialized information with good running times. Chapter 10 Elementary Data Structures 10.1.1 Chapter 10 treats simple data structures used to implement dynamic sets: stacks, queues, linked lists, and rooted trees. It is also shown how to synthesize pointers with arrays. 10.1 Stacks and Queues With stacks and queues the element removed by the DELETE operation is prespecified. In a stack, the deleted element is the last one added - a last-in, first-out, or LIFO policy. In a queue, the deleted element is the first one that was added - first-in, first-out, or FIFO policy. We show array implementations of both; linked list implementations also work. Stacks With a stack INSERT and DELETE are usually called PUSH and POP respectively. As shown in Figure 10.1, the array implementation S[1..n] of a stack has an attribute top[S], where the stack contains elements S[1..top[S]] (where S[1] is the bottom and S[top[S]] is the top). The stack is empty when top[S] = 0, which can be tested by the STACK-EMPTY query operation. Popping an empty stack causes underflow, an error, and pushing an element onto a stack with top[S] = n causes overflow, an error (but not tested in the code below). Implementations of stack operations - 10.1.2 each one takes O(1) time: STACK-EMPTY(S) 1 if top[S] = 0 2 then return TRUE 3 else return FALSE PUSH(S,x) 1 top[S] <- top[S] + 1 2 S[top[S]] <- x POP(S) 1 if STACK-EMPTY(S) 2 then error "underflow" 3 else top[S] <- top[S] - 1 4 return S[top[S] + 1] Queues With a queue INSERT and DELETE are usually called ENQUEUE and DEQUEUE respectively. As shown in Figure 10.2, the array implementation Q[1..n] of a queue has three attributes: head[Q], the next element to be removed, tail[Q], the place to insert the next element, and length[Q] = n, the size of the array. Figure 10.2 shows the array implementation of a queue with at most n-1 elements at locations head[Q], head[Q]+1, ..., tail[Q]-1, which "wrap-around" the end of Q[1..n]. 10.1.3 Q is empty if head[Q] = tail[Q] (both = 1 initially), and full if head[Q] = tail[Q] + 1. Dequeuing an empty queue is called underflow and enqueuing into a full queue is called overflow; both are errors (not checked in the code below). Both operations are O(1). ENQUEUE(Q,x) 1 Q[tail[Q]] <- x 2 if tail[Q] = length[Q] 3 then tail[Q] <- 1 4 else tail[Q] <- tail[Q] + 1 DEQUEUE(Q) 1 x <- Q[head[Q]] 2 if head[Q] = length[Q] 3 then head[Q] <- 1 4 else head[Q] <- head[Q] + 1 5 return x The deque (for doubly-ended queue) is a related data structure that allows insertion and deletion from both ends. Exercise 10.1-5 on page 204 hints that it is easy to write an array implementation similar to that above for an ordinary queue in which all four operations run in O(1) time. Linked Lists 10.2.1 The elements of a linked list are in a linear order determined by pointers from one element to the next. Linked lists support all dynamic set operations (not necessarily efficiently). Figure 10.3 shows a doubly linked list, which has an attribute head[L] which points to the first element in the list (a tail[L] attribute pointing to the last element may also exist in some implementations). Each element is an object with a key field and two pointer fields prev and next (and possibly satellite data or a handle to it). For a singly-linked list, we omit the prev field. A list can be sorted (the linear order is the same as the order of its keys) or unsorted. In a circular list, the prev field of the head points to the tail, and the next field of the tail points to the head. The code below is for an unsorted, doubly linked list. Searching a linked list LIST-SEARCH(L,k) returns either a pointer to the first element with key = k, or NIL if no such element exists. It is Theta(n) in the worst case (on a list with n elements). LIST-SEARCH(L,k) 1 x <- head[L] 2 while x != NIL and key[x] != k 3 do x <- next[x] 4 return x Inserting into a linked list 10.2.2 For an element x with key[x] set, LIST-INSERT puts x at the head of the list; it is O(1). LIST-INSERT(L,x) 1 next[x] <- head[L] 2 if head[L] != NIL 3 then prev[head[L]] <- x 4 head[L] <- x 5 prev[x] <- NIL Deleting from a linked list If we have a pointer x to an element in the list, LIST-DELETE(L,x) will remove it (we may have to call LIST-SEARCH(L,k) to find x). It runs in O(1) time (Theta(n) to also find x). LIST-DELETE(L,x) 1 if prev[x] != NIL 2 then next[prev[x]] <- next[x] 3 else head[L] <- next[x] 4 if next[x] != NIL 5 then prev[next[x]] <- prev[x] Sentinels LIST-DELETE would be simpler if we could omit the boundary tests, as shown below: LIST-DELETE'(L,x) 10.2.3 1 next[prev[x]] <- next[x] 2 prev[next[x]] <- prev[x] A sentinel is a dummy element that simplifies boundary conditions. We let nil[L] denote an object with no key set, but with next and prev set. We use it instead of NIL and in place of head[L]. So next[nil[L]] points to the head and prev[nil[L]] points to the tail of L. Thus the code for search will be the same, with those modifications, delete will be as shown above, and insert will simplify slightly as shown below. See Figure 10.4 LIST-SEARCH'(L,k) 1 x <- next[nil[L]] 2 while x != nil[L] and key[x] != k 3 do x <- next[x] 4 return x LIST-INSERT'(L,x) 1 next[x] <- next[nil[L]] 3 prev[next[nil[L]]] <- x 4 next[nil[L]] <- x 5 prev[x] <- nil[L] Sentinels rarely reduce asymptotic run times, but they can reduce the constant factors - important in (nested) loops with n or n^2 run times. Sentinels are only used in this book when they truly simplify code, as above. If there are many small lists, sentinels can significantly increase storage costs. 10.3 Implementing Pointers and Objects 10.3.1 In this section, we show 2 ways to implement pointers and objects in environments that don't have them, such as machine language or FORTRAN '77. A multiple-array representation of objects. The first way is to use "parallel" arrays to represent each of the fields of a set of objects with one array for each field. Figure 10.5 shows the linked list of Figure 10.3(a) with key, prev, and next fields; a pointer x is just a common index into the three arrays representing those fields. Thus key[x] means the same thing whether x is a pointer in a linked list or an array index. A single-array representation of objects. In real computer memory, objects are usually stored in contiguous locations, with a pointer pointing to the first location and the fields accessed by offsets. We can mimic this with a single array, assuming for simplicity that the fields are all the same size as an array element. Figure 10.6 shows how a single array can store the linked list of Figures 10.3(a) and 10.5. Note that the single-array method can be used to store different sized objects. Allocating and freeing objects 10.3.2 To simplify storage management, we now assume that all objects are the same size, and so we can use the multiple-array representation. To insert a new key into a list we need to allocate storage for it from a linked-list of unused objects. Often a garbage collector keeps track of unused space. Suppose the parallel arrays each have length m and the dynamic set contains n <= m objects. The remaining m - n objects are free - we keep them in a singly linked list called the free list, whose head is kept in a global variable called "free". In general a non-empty dynamic sets would be intertwined with the free list, as shown in Figures 10.7 and 10.8 The free list is a stack: the next object allocated is the last that was freed. PUSH and POP implement the freeing and allocating of objects in the free list (Figure 10.7): ALLOCATE-OBJECT() 1 if free = NIL 2 then error "out of space" 3 else x <- free 4 free <- next[x] 5 return x FREE-OBJECT(x) 1 next[x] <- free 2 free <- x 10.4 Representing rooted trees 10.4.1 We consider binary trees and trees whose nodes can have arbitrarily many children. Binary trees The pointers in a binary tree T are p, left, and right, pointing to the parent, left, and right child. If p[x] = NIL, x is the root, and x has no left child if left[x] = NIL, and similarly for the right child. root[T] is a pointer to the root and root[T]= NIL if T is empty. Figure 10.9 shows a binary tree. Rooted trees with unbounded branching If a node can have any number of children, it won't work to have a pointer to each of them, so we use the left-child, right-sibling method shown in Figure 10.10. Each node has pointers p[x] to its parent, left-child[x] to its left- most child, and right-sibling[x] to its next sibling to the right. As before root[T] is a pointer to the root. If left-child[x] = NIL, x has no children; if right-sibling[x] = NIL, x is the right-most child of its parent. Other tree representations Chapter 6 represented a heap by an array and the trees of Chapter 21 are traversed only toward the root, and so use no child pointers.