Part III Data Structures III.1
Introduction
Sets are important in both in mathematics,
where they are infinite and they don't change,
and in computer science where they are always
finite and can change in size, and are hence
called "dynamic". These are the kinds of sets
treated in Part III.
Basic operations on sets include insertion,
deletion, and membership query. A set that
supports these operations is a "dictionary".
Different operations may be required depending
on the application.
Elements of a dynamic set
The sets consist of objects that may have a
"key" value; if the keys are distinct, we may
consider the set as consisting of the keys.
The object may contain other values used by
the set operations, such as pointers to other
objects. The object may also have "satellite
data" in an actual application, which is not
used by the set operations, so we ignore it.
The keys are often from a totally ordered set
(e.g. real numbers or strings in alphabetical
order). Then we can speak of the minimum or
the next element.
Operations on dynamic sets III.2
There are two kinds of operations: 1) queries
which return information, and 2) modifying
operations, which change the set. The first
two operations below are modifying operations;
the rest are queries. An application may only
require some of them.
INSERT(S,x) - adds the element pointed to by x
to S. We assume that fields in x used by
the set operations have been initialized.
DELETE(S,x) - removes the element pointed to
by x from S. If we want to "delete by key",
k, we call SEARCH(S,k) first to obtain x.
SEARCH(S,k) - returns a pointer to an element
x in S with key[x] = k or NIL if none exits.
MINIMUM(S) - returns a pointer to the element
in a totally ordered set with smallest key.
MAXIMUM(S) - returns a pointer to the element
in a totally ordered set with largest key.
SUCCESSOR(S,x) - returns a pointer to the
next larger element in a totally ordered set
S, or NIL, if x is the maximum element.
PREDECESSOR(S,x) - returns a pointer to the
next smaller element in a totally ordered
set S, or NIL, if x is the minimum element.
SUCCESSOR and PREDECESSOR are even used III.3
with sets with duplicate keys. For a set with
n keys, calling MINIMUM, then SUCCESSOR n - 1
times will list the keys in order. The time
taken to run an operations is measured in
terms of n; all the operations above run in
O(lg n) time for a red-black tree (Chap. 13).
Overview of Part III
In Chapter 6, we have seen heaps, which
support INSERT, MAXIMUM, and a limited DELETE.
Chapter 10 reviews stacks, queues, linked
lists, and rooted trees; it also shows how to
implement objects with pointers by arrays.
Chapter 11 discusses hash tables, which
support an O(1) SEARCH operation.
Chapter 12 reviews binary search trees, whose
operations take O(lg n) in the average case,
but can become unbalanced in the worst case.
Chapter 13 discusses red-black trees, which
always remain balanced, so their operations
take O(lg n) in the worst case.
Chapter 14 discusses how red-black trees can
be augmented in order to provide specialized
information with good running times.
Chapter 10 Elementary Data Structures 10.1.1
Chapter 10 treats simple data structures used
to implement dynamic sets: stacks, queues,
linked lists, and rooted trees. It is also
shown how to synthesize pointers with arrays.
10.1 Stacks and Queues
With stacks and queues the element removed by
the DELETE operation is prespecified. In a
stack, the deleted element is the last one
added - a last-in, first-out, or LIFO policy.
In a queue, the deleted element is the first
one that was added - first-in, first-out, or
FIFO policy. We show array implementations of
both; linked list implementations also work.
Stacks
With a stack INSERT and DELETE are usually
called PUSH and POP respectively. As shown in
Figure 10.1, the array implementation S[1..n]
of a stack has an attribute top[S], where the
stack contains elements S[1..top[S]] (where
S[1] is the bottom and S[top[S]] is the top).
The stack is empty when top[S] = 0, which can
be tested by the STACK-EMPTY query operation.
Popping an empty stack causes underflow, an
error, and pushing an element onto a stack
with top[S] = n causes overflow, an error (but
not tested in the code below).
Implementations of stack operations - 10.1.2
each one takes O(1) time:
STACK-EMPTY(S)
1 if top[S] = 0
2 then return TRUE
3 else return FALSE
PUSH(S,x)
1 top[S] <- top[S] + 1
2 S[top[S]] <- x
POP(S)
1 if STACK-EMPTY(S)
2 then error "underflow"
3 else top[S] <- top[S] - 1
4 return S[top[S] + 1]
Queues
With a queue INSERT and DELETE are usually
called ENQUEUE and DEQUEUE respectively. As
shown in Figure 10.2, the array implementation
Q[1..n] of a queue has three attributes:
head[Q], the next element to be removed,
tail[Q], the place to insert the next element,
and length[Q] = n, the size of the array.
Figure 10.2 shows the array implementation of
a queue with at most n-1 elements at locations
head[Q], head[Q]+1, ..., tail[Q]-1, which
"wrap-around" the end of Q[1..n].
10.1.3
Q is empty if head[Q] = tail[Q] (both = 1
initially), and full if head[Q] = tail[Q] + 1.
Dequeuing an empty queue is called underflow
and enqueuing into a full queue is called
overflow; both are errors (not checked in the
code below). Both operations are O(1).
ENQUEUE(Q,x)
1 Q[tail[Q]] <- x
2 if tail[Q] = length[Q]
3 then tail[Q] <- 1
4 else tail[Q] <- tail[Q] + 1
DEQUEUE(Q)
1 x <- Q[head[Q]]
2 if head[Q] = length[Q]
3 then head[Q] <- 1
4 else head[Q] <- head[Q] + 1
5 return x
The deque (for doubly-ended queue) is a
related data structure that allows insertion
and deletion from both ends. Exercise 10.1-5
on page 204 hints that it is easy to write an
array implementation similar to that above
for an ordinary queue in which all four
operations run in O(1) time.
Linked Lists 10.2.1
The elements of a linked list are in a linear
order determined by pointers from one element
to the next. Linked lists support all dynamic
set operations (not necessarily efficiently).
Figure 10.3 shows a doubly linked list, which
has an attribute head[L] which points to the
first element in the list (a tail[L] attribute
pointing to the last element may also exist in
some implementations). Each element is an
object with a key field and two pointer fields
prev and next (and possibly satellite data or
a handle to it). For a singly-linked list, we
omit the prev field.
A list can be sorted (the linear order is the
same as the order of its keys) or unsorted.
In a circular list, the prev field of the head
points to the tail, and the next field of the
tail points to the head. The code below is
for an unsorted, doubly linked list.
Searching a linked list
LIST-SEARCH(L,k) returns either a pointer to
the first element with key = k, or NIL if no
such element exists. It is Theta(n) in the
worst case (on a list with n elements).
LIST-SEARCH(L,k)
1 x <- head[L]
2 while x != NIL and key[x] != k
3 do x <- next[x]
4 return x
Inserting into a linked list 10.2.2
For an element x with key[x] set, LIST-INSERT
puts x at the head of the list; it is O(1).
LIST-INSERT(L,x)
1 next[x] <- head[L]
2 if head[L] != NIL
3 then prev[head[L]] <- x
4 head[L] <- x
5 prev[x] <- NIL
Deleting from a linked list
If we have a pointer x to an element in the
list, LIST-DELETE(L,x) will remove it (we may
have to call LIST-SEARCH(L,k) to find x). It
runs in O(1) time (Theta(n) to also find x).
LIST-DELETE(L,x)
1 if prev[x] != NIL
2 then next[prev[x]] <- next[x]
3 else head[L] <- next[x]
4 if next[x] != NIL
5 then prev[next[x]] <- prev[x]
Sentinels
LIST-DELETE would be simpler if we could omit
the boundary tests, as shown below:
LIST-DELETE'(L,x) 10.2.3
1 next[prev[x]] <- next[x]
2 prev[next[x]] <- prev[x]
A sentinel is a dummy element that simplifies
boundary conditions. We let nil[L] denote an
object with no key set, but with next and prev
set. We use it instead of NIL and in place of
head[L]. So next[nil[L]] points to the head
and prev[nil[L]] points to the tail of L.
Thus the code for search will be the same,
with those modifications, delete will be as
shown above, and insert will simplify slightly
as shown below. See Figure 10.4
LIST-SEARCH'(L,k)
1 x <- next[nil[L]]
2 while x != nil[L] and key[x] != k
3 do x <- next[x]
4 return x
LIST-INSERT'(L,x)
1 next[x] <- next[nil[L]]
3 prev[next[nil[L]]] <- x
4 next[nil[L]] <- x
5 prev[x] <- nil[L]
Sentinels rarely reduce asymptotic run times,
but they can reduce the constant factors -
important in (nested) loops with n or n^2 run
times. Sentinels are only used in this book
when they truly simplify code, as above. If
there are many small lists, sentinels can
significantly increase storage costs.
10.3 Implementing Pointers and Objects 10.3.1
In this section, we show 2 ways to implement
pointers and objects in environments that
don't have them, such as machine language or
FORTRAN '77.
A multiple-array representation of objects.
The first way is to use "parallel" arrays to
represent each of the fields of a set of
objects with one array for each field. Figure
10.5 shows the linked list of Figure 10.3(a)
with key, prev, and next fields; a pointer x
is just a common index into the three arrays
representing those fields. Thus key[x] means
the same thing whether x is a pointer in a
linked list or an array index.
A single-array representation of objects.
In real computer memory, objects are usually
stored in contiguous locations, with a pointer
pointing to the first location and the fields
accessed by offsets. We can mimic this with a
single array, assuming for simplicity that the
fields are all the same size as an array
element. Figure 10.6 shows how a single array
can store the linked list of Figures 10.3(a)
and 10.5. Note that the single-array method
can be used to store different sized objects.
Allocating and freeing objects 10.3.2
To simplify storage management, we now assume
that all objects are the same size, and so we
can use the multiple-array representation.
To insert a new key into a list we need to
allocate storage for it from a linked-list of
unused objects. Often a garbage collector
keeps track of unused space.
Suppose the parallel arrays each have length
m and the dynamic set contains n <= m objects.
The remaining m - n objects are free - we keep
them in a singly linked list called the free
list, whose head is kept in a global variable
called "free". In general a non-empty dynamic
sets would be intertwined with the free list,
as shown in Figures 10.7 and 10.8
The free list is a stack: the next object
allocated is the last that was freed. PUSH
and POP implement the freeing and allocating
of objects in the free list (Figure 10.7):
ALLOCATE-OBJECT()
1 if free = NIL
2 then error "out of space"
3 else x <- free
4 free <- next[x]
5 return x
FREE-OBJECT(x)
1 next[x] <- free
2 free <- x
10.4 Representing rooted trees 10.4.1
We consider binary trees and trees whose
nodes can have arbitrarily many children.
Binary trees
The pointers in a binary tree T are p, left,
and right, pointing to the parent, left, and
right child. If p[x] = NIL, x is the root,
and x has no left child if left[x] = NIL, and
similarly for the right child. root[T] is a
pointer to the root and root[T]= NIL if T is
empty. Figure 10.9 shows a binary tree.
Rooted trees with unbounded branching
If a node can have any number of children, it
won't work to have a pointer to each of them,
so we use the left-child, right-sibling method
shown in Figure 10.10. Each node has pointers
p[x] to its parent, left-child[x] to its left-
most child, and right-sibling[x] to its next
sibling to the right. As before root[T] is a
pointer to the root. If left-child[x] = NIL,
x has no children; if right-sibling[x] = NIL,
x is the right-most child of its parent.
Other tree representations
Chapter 6 represented a heap by an array and
the trees of Chapter 21 are traversed only
toward the root, and so use no child pointers.