Chapter 12 Binary Search Trees 12.1.1 Dynamic-sets are sets that can grow or shrink (by adding or removing elements). Search trees are data structures that support many of the dynamic-set operations: SEARCH, MINIMUM, MAXIMUM, PREDECESSOR, SUCCESSOR, INSERT, and DELETE. Thus a search tree can be used both as a dictionary and as a priority queue. Operations on a binary search tree (BST) take Theta(h) time, so for a complete binary tree with n nodes, this would be Theta(lg(n)), and for a "linear" tree it would be Theta(n). The height of a randomly-built tree = Theta(lg(n)) so the operations would take Theta(lg(n)). There are variations on BSTs whose worst-case performance can be guaranteed to be good. 12.1 What is a binary search tree? Each node contains a key value, and pointers (possibly NIL) left, right (to children), and p (to parent), in addition to satellite data. See Figure 12.1 page 354 for examples. The keys satisfy the binary search tree property: For any node x, if y is a node in the left subtree of x, key[y] <= key[x]; if y is a node in the right subtree of x, key[y] >= key[x]. 12.1.2 We can visit the nodes of a BST, T, in sorted order by keys using an inorder tree walk (also preorder and postorder tree walks can be done) The following prints the keys in sorted order with the call INORDER-TREE-WALK(root[T]): INORDER-TREE-WALK(x) 1 if x not = NIL 2 then INORDER-TREE-WALK(left[x]) 3 print key(x) 4 INORDER-TREE-WALK(right[x]) Theorem 12.1 If x is the root of an n-node subtree, the the call INORDER-TREE-WALK(x) takes Theta(n) time. Proof: Let T(n) denote the time taken by INORDER-TREE-WALK(x) when x is the root of an n-node subtree. Then T(0) = c, a positive constant time to do the test for x being NIL. For n > 0, suppose the left subtree of x has k nodes, so the right subtree has n - k - 1 nodes, so T(n) = T(k) + T(n - k - 1) + d for some positive constant d. We show that T(n) = (c + d)n + c by the substitution method T(n) = T(k) + T(n - k - 1) + d = ((c+d)k + c) + ((c+d)(n-k-1) + c) + d = (c+d)n + c -1*(c+d) + c + d = (c+d)n + c 12.2 Querying a binary search tree 12.2.1 Query operations on a BST: SEARCH (the most common), MINIMUM, MAXIMUM, SUCCESSOR, and PREDECESSOR. Each can be performed in time O(h), where h is the height of the tree. Searching Given a pointer to the root of a BST and a key, k, TREE-SEARCH returns a pointer to a node with key k if one exists, or NIL if not. TREE-SEARCH(x,k) 1 if x = NIL or k = key[x] 2 then return x 3 if k < key[x] 4 then return TREE-SEARCH(left[x],k) 5 else return TREE-SEARCH(right[x],k) The search progresses downward in the tree, as in Figure 12.2, and so the number of nodes encountered, and hence the running time is O(h), where h is the height of the tree. An iterative (more efficient?) version: ITERATIVE-TREE-SEARCH(x,k) 1 while x not = NIL and k not = key[x] 2 do if k < key[x] 3 then x <- left[x] 4 else x <- right[x] 5 return x Minimum and maximum 12.2.2 A node in a BST whose key is a minimum can be found by following left pointers until a NIL is encountered. The following procedure returns a pointer to the node with minimum key in a BST rooted at x. It is correct by the binary-search-tree property. TREE-MINIMUM(x) 1 while left[x] not = NIL 2 do x <- left[x] 3 return x Similarly, the following procedure returns a pointer to the node with maximum key: TREE-MAXIMUM(x) 1 while right[x] not = NIL 2 do x <- right[x] 3 return x Both these procedures run in O(h) time for the same reason SEARCH runs in O(h) time. Successor and predecessor The following procedure returns the successor of a node x in a BST, and NIL if key[x] is the largest key in the tree: TREE-SUCCESSOR(x) 12.2.3 1 if right[x] not = NIL 2 then return TREE-MINIMUM(right[x]) 3 y <- p[x] 4 while y not = NIL and x = right[y] 5 do x <- y 6 y <- p[x] 7 return y If the right subtree of x is not empty, the successor is the left-most node in the right subtree -- found by TREE-MINIMUM(right[x]) If the right subtree of x is empty and x has a successor y, then y is the lowest ancestor of x whose left child is also an ancestor of x (Exercise 12.2-6). To find such a lowest ancestor, y, we go up the tree (lines 3-7). The running time of TREE-SUCCESSOR is O(h) since we either follow a path up the tree or down the tree, the length of such paths is O(h), and we execute a constant number of operations at each node. The same is true of TREE-PREDECESSOR, defined symmetrically. Even if the keys are not distinct, we can define the successor or predecessor as the node returned by the those procedures. Theorem: The dynamic-set queries MINIMUM, MAXIMUM, SUCCESSOR, PREDECESSOR, and SEARCH can be made to run in O(h) time in a BST. 12.3 Insertion and deletion 12.3.1 The insertion and deletion operations of a dynamic set are modifiers and allow it to change. For a BST, we also need to preserve the binary-search-tree property. Insertion To insert a new node, z, into a BST, T, we assume that key[z] = v, and p[z], left[z], and right[z] are all NIL. TREE-INSERT(T,z) 1 y <- NIL 2 x <- root[T] 3 while x not = NIL 4 do y <- x 5 if key[z] < key[x] 6 then x <- left[x] 7 else x <- right[x] 8 p[z] <- y 9 if y = NIL |> Tree T was empty 10 then root[T] <- z 11 else if key[z] < key[y] 12 then left[y] <- z 13 else right[y] <- z Figure 12.3, page 262, shows how TREE-INSERT works: it begins at the root and traces a path downward, 3-8 maintaining y as parent[x]. When x becomes NIL, that is where we want to place z; lines 8-13 set pointers to do that. TREE-INSERT runs in O(h) for the same 12.3.2 reason as the query procedures above. Deletion The procedure for deleting a node (pointed to by) z, takes z as an argument. The procedure considers 3 cases, shown in Figure 12.4: if z has no children, we set its parent's pointer to it to NIL; if z has only one child, we "splice it out" by linking its parent to its child; and if z has two children, replace its data with that of its successor, which is "spliced out". The code is a bit different. TREE-DELETE(T,z) 1 if left[z] = NIL or right[z] = NIL 2 then y <- z 3 else y <- TREE-SUCCESSOR(z) 4 if left[y] not = NIL 5 then x <- left[y] 6 else x <- right[y] 7 if x not = NIL 8 then p[x] <- p[y] 9 if p[y] = NIL 10 then root[T] <- x 11 else if y = left[p[y]] 12 then left[p[y]] <- x 13 else right[p[y]] <- x 14 if y not = z 15 then key[z] <- key[y] 16 copy y's satellite data into z 17 return y 12.3.3 Lines 1-3 determine a node y to splice out: y is either z (if z has at most one child) or the successor of z (if z has two children). Lines 4-6 set x to the non-NIL child of y (TREE-SUCCESSOR(z) has at most one child), or to NIL if y has no children. Lines 7-13 splice out y by modifying pointers in p[y] and x. This is a bit tricky when x = NIL or when y is the root. If the successor, y, of z was the node that was spliced out, y's key and satellite data are overwritten into z in lines 14-16. The node y is returned in line 17 so that the calling procedure can free the memory it uses. TREE-DELETE runs in O(h) time, since all steps take a constant amount of time except TREE-SUCCESSOR, which runs in O(h) time. Theorem: The dynamic-set modifying operations INSERT and DELETE can be made to run in O(h) time on a BST of height h. 12.4 Randomly built binary search trees It is shown that the expected height of a randomly built BST with n keys is O(lg(n)).