Chapter 12 Binary Search Trees 12.1.1
Dynamic-sets are sets that can grow or shrink
(by adding or removing elements). Search
trees are data structures that support many of
the dynamic-set operations: SEARCH, MINIMUM,
MAXIMUM, PREDECESSOR, SUCCESSOR, INSERT, and
DELETE. Thus a search tree can be used both
as a dictionary and as a priority queue.
Operations on a binary search tree (BST) take
Theta(h) time, so for a complete binary tree
with n nodes, this would be Theta(lg(n)), and
for a "linear" tree it would be Theta(n). The
height of a randomly-built tree = Theta(lg(n))
so the operations would take Theta(lg(n)).
There are variations on BSTs whose worst-case
performance can be guaranteed to be good.
12.1 What is a binary search tree?
Each node contains a key value, and pointers
(possibly NIL) left, right (to children), and
p (to parent), in addition to satellite data.
See Figure 12.1 page 354 for examples. The
keys satisfy the binary search tree property:
For any node x, if y is a node in the left
subtree of x, key[y] <= key[x]; if y is a node
in the right subtree of x, key[y] >= key[x].
12.1.2
We can visit the nodes of a BST, T, in sorted
order by keys using an inorder tree walk (also
preorder and postorder tree walks can be done)
The following prints the keys in sorted order
with the call INORDER-TREE-WALK(root[T]):
INORDER-TREE-WALK(x)
1 if x not = NIL
2 then INORDER-TREE-WALK(left[x])
3 print key(x)
4 INORDER-TREE-WALK(right[x])
Theorem 12.1 If x is the root of an n-node
subtree, the the call INORDER-TREE-WALK(x)
takes Theta(n) time.
Proof: Let T(n) denote the time taken by
INORDER-TREE-WALK(x) when x is the root of an
n-node subtree. Then T(0) = c, a positive
constant time to do the test for x being NIL.
For n > 0, suppose the left subtree of x has
k nodes, so the right subtree has n - k - 1
nodes, so T(n) = T(k) + T(n - k - 1) + d for
some positive constant d. We show that
T(n) = (c + d)n + c by the substitution method
T(n) = T(k) + T(n - k - 1) + d
= ((c+d)k + c) + ((c+d)(n-k-1) + c) + d
= (c+d)n + c -1*(c+d) + c + d
= (c+d)n + c
12.2 Querying a binary search tree 12.2.1
Query operations on a BST: SEARCH (the most
common), MINIMUM, MAXIMUM, SUCCESSOR, and
PREDECESSOR. Each can be performed in time
O(h), where h is the height of the tree.
Searching
Given a pointer to the root of a BST and a
key, k, TREE-SEARCH returns a pointer to a
node with key k if one exists, or NIL if not.
TREE-SEARCH(x,k)
1 if x = NIL or k = key[x]
2 then return x
3 if k < key[x]
4 then return TREE-SEARCH(left[x],k)
5 else return TREE-SEARCH(right[x],k)
The search progresses downward in the tree,
as in Figure 12.2, and so the number of nodes
encountered, and hence the running time is
O(h), where h is the height of the tree.
An iterative (more efficient?) version:
ITERATIVE-TREE-SEARCH(x,k)
1 while x not = NIL and k not = key[x]
2 do if k < key[x]
3 then x <- left[x]
4 else x <- right[x]
5 return x
Minimum and maximum 12.2.2
A node in a BST whose key is a minimum can be
found by following left pointers until a NIL
is encountered. The following procedure
returns a pointer to the node with minimum
key in a BST rooted at x. It is correct by
the binary-search-tree property.
TREE-MINIMUM(x)
1 while left[x] not = NIL
2 do x <- left[x]
3 return x
Similarly, the following procedure returns a
pointer to the node with maximum key:
TREE-MAXIMUM(x)
1 while right[x] not = NIL
2 do x <- right[x]
3 return x
Both these procedures run in O(h) time for
the same reason SEARCH runs in O(h) time.
Successor and predecessor
The following procedure returns the successor
of a node x in a BST, and NIL if key[x] is the
largest key in the tree:
TREE-SUCCESSOR(x) 12.2.3
1 if right[x] not = NIL
2 then return TREE-MINIMUM(right[x])
3 y <- p[x]
4 while y not = NIL and x = right[y]
5 do x <- y
6 y <- p[x]
7 return y
If the right subtree of x is not empty, the
successor is the left-most node in the right
subtree -- found by TREE-MINIMUM(right[x])
If the right subtree of x is empty and x has
a successor y, then y is the lowest ancestor
of x whose left child is also an ancestor of x
(Exercise 12.2-6). To find such a lowest
ancestor, y, we go up the tree (lines 3-7).
The running time of TREE-SUCCESSOR is O(h)
since we either follow a path up the tree or
down the tree, the length of such paths
is O(h), and we execute a constant number of
operations at each node. The same is true of
TREE-PREDECESSOR, defined symmetrically.
Even if the keys are not distinct, we can
define the successor or predecessor as the
node returned by the those procedures.
Theorem: The dynamic-set queries MINIMUM,
MAXIMUM, SUCCESSOR, PREDECESSOR, and SEARCH
can be made to run in O(h) time in a BST.
12.3 Insertion and deletion 12.3.1
The insertion and deletion operations of a
dynamic set are modifiers and allow it to
change. For a BST, we also need to preserve
the binary-search-tree property.
Insertion
To insert a new node, z, into a BST, T, we
assume that key[z] = v, and p[z], left[z],
and right[z] are all NIL.
TREE-INSERT(T,z)
1 y <- NIL
2 x <- root[T]
3 while x not = NIL
4 do y <- x
5 if key[z] < key[x]
6 then x <- left[x]
7 else x <- right[x]
8 p[z] <- y
9 if y = NIL |> Tree T was empty
10 then root[T] <- z
11 else if key[z] < key[y]
12 then left[y] <- z
13 else right[y] <- z
Figure 12.3, page 262, shows how TREE-INSERT
works: it begins at the root and traces a
path downward, 3-8 maintaining y as parent[x].
When x becomes NIL, that is where we want to
place z; lines 8-13 set pointers to do that.
TREE-INSERT runs in O(h) for the same 12.3.2
reason as the query procedures above.
Deletion
The procedure for deleting a node (pointed to
by) z, takes z as an argument. The procedure
considers 3 cases, shown in Figure 12.4: if z
has no children, we set its parent's pointer
to it to NIL; if z has only one child, we
"splice it out" by linking its parent to its
child; and if z has two children, replace its
data with that of its successor, which is
"spliced out". The code is a bit different.
TREE-DELETE(T,z)
1 if left[z] = NIL or right[z] = NIL
2 then y <- z
3 else y <- TREE-SUCCESSOR(z)
4 if left[y] not = NIL
5 then x <- left[y]
6 else x <- right[y]
7 if x not = NIL
8 then p[x] <- p[y]
9 if p[y] = NIL
10 then root[T] <- x
11 else if y = left[p[y]]
12 then left[p[y]] <- x
13 else right[p[y]] <- x
14 if y not = z
15 then key[z] <- key[y]
16 copy y's satellite data into z
17 return y
12.3.3
Lines 1-3 determine a node y to splice out: y
is either z (if z has at most one child) or
the successor of z (if z has two children).
Lines 4-6 set x to the non-NIL child of y
(TREE-SUCCESSOR(z) has at most one child), or
to NIL if y has no children.
Lines 7-13 splice out y by modifying pointers
in p[y] and x. This is a bit tricky when
x = NIL or when y is the root.
If the successor, y, of z was the node that
was spliced out, y's key and satellite data
are overwritten into z in lines 14-16.
The node y is returned in line 17 so that the
calling procedure can free the memory it uses.
TREE-DELETE runs in O(h) time, since all
steps take a constant amount of time except
TREE-SUCCESSOR, which runs in O(h) time.
Theorem: The dynamic-set modifying operations
INSERT and DELETE can be made to run in
O(h) time on a BST of height h.
12.4 Randomly built binary search trees
It is shown that the expected height of a
randomly built BST with n keys is O(lg(n)).