Chapter 12 Binary Search Trees 12.1.1 Dynamic-sets are sets that can grow or shrink (by adding or removing elements). Search trees are data structures that support many of the dynamic-set operations: SEARCH, MINIMUM, MAXIMUM, PREDECESSOR, SUCCESSOR, INSERT, and DELETE. Thus a search tree can be used both as a dictionary and as a priority queue. Operations on a binary search tree (BST) take Theta(h) time, so for a complete binary tree with n nodes, this would be Theta(lg(n)), and for a "linear" tree it would be Theta(n). The height of a randomly-built tree = Theta(lg(n)) so the operations would take Theta(lg(n)). There are variations on BSTs whose worst-case performance can be guaranteed to be good. 12.1 What is a binary search tree? Each node contains a key value, and pointers (possibly NIL) left, right (to children), and p (to parent), in addition to satellite data. See Figure 12.1 for examples. The keys satisfy the binary search tree property: For any node x, if y is a node in the left subtree of x, x.key <= x.key; if y is a node in the right subtree of x, x.key >= x.key. 12.1.2 We can visit the nodes of a BST, T, in sorted order by keys using an inorder tree walk (also preorder and postorder tree walks can be done) The following prints the keys in sorted order with the call INORDER-TREE-WALK(T.root): INORDER-TREE-WALK(x) 1 if x != NIL 2 INORDER-TREE-WALK(x.left) 3 print key(x) 4 INORDER-TREE-WALK(x.right) Theorem 12.1 If x is the root of an n-node subtree, the the call INORDER-TREE-WALK(x) takes Theta(n) time. Proof: Let T(n) denote the time taken by INORDER-TREE-WALK(x) when x is the root of an n-node subtree. Then T(0) = c, a positive constant time to do the test for x being NIL. For n > 0, suppose the left subtree of x has k nodes, so the right subtree has n - k - 1 nodes, so T(n) = T(k) + T(n - k - 1) + d for some positive constant d. We show that T(n) = (c + d)n + c by the substitution method T(n) = T(k) + T(n - k - 1) + d = ((c+d)k + c) + ((c+d)(n-k-1) + c) + d = (c+d)n + c -1*(c+d) + c + d = (c+d)n + c 12.2 Querying a binary search tree 12.2.1 Query operations on a BST: SEARCH (the most common), MINIMUM, MAXIMUM, SUCCESSOR, and PREDECESSOR. Each can be performed in time O(h), where h is the height of the tree. Searching Given a pointer to the root of a BST and a key, k, TREE-SEARCH returns a pointer to a node with key k if one exists, or NIL if not. TREE-SEARCH(x,k) 1 if x == NIL or k == x.key 2 return x 3 if k < x.key 4 return TREE-SEARCH(x.left, k) 5 else return TREE-SEARCH(x.right, k) The search progresses downward in the tree, as in Figure 12.2, and so the number of nodes encountered, and hence the running time is O(h), where h is the height of the tree. An iterative (more efficient?) version: ITERATIVE-TREE-SEARCH(x,k) 1 while x != NIL and k != x.key 2 if k < x.key 3 x = x.left 4 else x = x.right 5 return x Minimum and maximum 12.2.2 A node in a BST whose key is a minimum can be found by following left pointers until a NIL is encountered. The following procedure returns a pointer to the node with minimum key in a BST rooted at x. It is correct by the binary-search-tree property. TREE-MINIMUM(x) 1 while x.left != NIL 2 x = x.left 3 return x Similarly, the following procedure returns a pointer to the node with maximum key: TREE-MAXIMUM(x) 1 while x.right != NIL 2 x = x.right 3 return x Both these procedures run in O(h) time for the same reason SEARCH runs in O(h) time. Successor and predecessor The following procedure returns the successor of a node x in a BST, and NIL if x.key is the largest key in the tree: TREE-SUCCESSOR(x) 12.2.3 1 if x.right != NIL 2 return TREE-MINIMUM(x.right) 3 y = x.p 4 while y != NIL and x == y.right 5 x = y 6 y = x.p 7 return y If the right subtree of x is not empty, the successor is the left-most node in the right subtree -- found by TREE-MINIMUM(x.right) If the right subtree of x is empty and x has a successor y, then y is the lowest ancestor of x whose left child is also an ancestor of x (Exercise 12.2-6). To find such a lowest ancestor, y, we go up the tree (lines 3-7). The running time of TREE-SUCCESSOR is O(h) since we either follow a path up the tree or down the tree, the length of such paths is O(h), and we execute a constant number of operations at each node. The same is true of TREE-PREDECESSOR, defined symmetrically. Even if the keys are not distinct, we can define the successor or predecessor as the node returned by the those procedures. Theorem: The dynamic-set queries MINIMUM, MAXIMUM, SUCCESSOR, PREDECESSOR, and SEARCH can be made to run in O(h) time in a BST. 12.3 Insertion and deletion 12.3.1 The insertion and deletion operations of a dynamic set are modifiers and allow it to change. For a BST, we also need to preserve the binary-search-tree property. Insertion To insert a new node, z, into a BST, T, we assume that z.key = v, and z.p, z.left, and z.right are all NIL. TREE-INSERT(T,z) 1 y = NIL 2 x = T.root 3 while x != NIL 4 y = x 5 if z.key < x.key 6 x = x.left 7 else x = x.right 8 z.p = y 9 if y == NIL // Tree T was empty 10 T.root = z 11 else if z.key < y.key 12 y.left = z 13 else y.right = z 12.3.2 Figure 12.3, shows how TREE-INSERT works: it begins at the root and traces a path downward, 3-8 maintaining y as x's parent. When x becomes NIL, that is where we want to place z; lines 8-13 set pointers to do that. TREE-INSERT runs in O(h) time for the same reason as the query procedures above. Deletion The procedure for deleting a node (pointed to by) z, takes z as an argument. The procedure considers 3 cases: shown in Figure 12.4: - if z has no children, we set its parent's pointer to it to NIL - if z has only one child, we elevate that child to take z's position by having z's parent point to the child. - if z has two children, we find z's successor y (in z's right subtree) and have y take z's position. The rest of z's original right subtree becomes y's new right subtree, and z's left subtree becomes y's new left subtree. It matters whether y is z's right child or not, leading to two subcases. The delete procedure is organized differently into 4 cases, shown in Figure 12.4, as follows 12.3.3 - If z has no left child (Fig. 12.4a), we replace z by its right child. When z's right child is also NIL, z has no children; when z's right child is not NIL, z has one child. - If z has just one child, it is a left child and we replace z by that child (Fig. 12.4b). - Otherwise z has two children. We find z's successor y which is in z's right subtree and has no left child (Exercise 12.2-5). We move y to z's position, adjusting subtrees. - If y is z's right child (Fig. 12.4c), we replace z by y, maintaining y's right child. - If y is not z's right child (Fig. 12.4d), we first replace y by its own right child, then replace z by y. We use a routine TRANSPLANT to move subtrees around in a binary tree: the subtree rooted at u is replaced by the subtree rooted at v, v becoming the appropriate child of u's parent. 12.3.4 TRANSPLANT(T,u,v) 1 if u.p == NIL // u is the root 2 T.root = v 3 else if u == u.p.left // u is left child 4 u.p.left = v 5 else u.p.right = v // u is right child 6 if v != NIL 7 v.p = u.p NOTE: TRANSPLANT does not update v's children - the calling program does that if needed. TREE-DELETE(T,z) 1 if z.left == NIL // case (a) 2 TRANSPLANT(T,z,z.right) 3 else if z.right == NIL // case (b) 4 TRANSPLANT(T,z,z.left) 5 else y = TREE-MINIMUM(z.right) 6 if y.p != z // case (d) 7 TRANSPLANT(T,y,y.right) // step 1 8 y.right = z.right 9 y.right.p = y 10 TRANSPLANT(T,z,y) // case (c) 11 y.left = z.left // and case (d) 12 y.left.p = y // step 2 12.3.5 TREE-DELETE runs in O(h) time, since all steps take a constant amount of time except TREE-MINIMUM, which runs in O(h) time. Theorem 12.3: The dynamic-set operations INSERT and DELETE can be made to run in O(h) time on a BST of height h. 12.4 Randomly built binary search trees It is shown that the expected height of a randomly built BST with n keys is O(lg(n)).