Chapter 13 Red-Black Trees 13.1.1 Red-black trees are one scheme for insuring that binary search trees remain balanced, so that their height never gets larger than O(lg n), where n is the number of keys. 13.1 Properties of red-black trees A red-black tree is a binary search tree with an extra bit of data, its color: RED or BLACK. By constraining the coloring of nodes, red-black trees ensure that any path from the root to a leaf is no more than twice as long as any other such path, so red-black are approximately balanced. A binary search tree is a red-black tree if it satisfies the red-black tree properties: 1. Every node is either red or black. 2. The root is black. 3. Every leaf (NIL) is black. 4. If a node is red, then both of its children are black. 5. For each node, all paths from that node to descendent leaves contain the same number of black nodes. Figure 13.1(a) shows an example. 13.1.2 For convenience, we use a single sentinel, T.nil to represent NIL in the tree T. Its color is BLACK. It represents all the leaves and the parent of the root. Figure 13.1(b) shows an example. Since we are only interested in internal, key-holding nodes, we omit drawing the leaves, as shown in Figure 13.1(c). We define the black-height, bh(x), of a node x as the number of black nodes on a path from x to a leaf (but not counting x), which is well defined by property 5. The black-height of a tree is the black-height of its root. Lemma 13.1 A red-black tree with n internal nodes has height at most 2lg(n + 1). Proof: We first show that the subtree rooted at a node x contains at least 2^bh(x) - 1 internal nodes by induction on the height of x. If the height is 0, x must be a leaf and the subtree rooted at x contains 2^bh(x) - 1 = 2^0 - 1 = 1 - 1 = 0 internal nodes. If x has height > 0, each child has black height of either bh(x) or bh(x) - 1, and since a child has height less than x, we can apply the induction hypothesis: each child has at least 2^(bh(x)-1) - 1 internal nodes. So the subtree rooted at x contains 2(2^(bh(x)-1)-1) + 1 = 2^bh(x) - 1 internal nodes, as desired. 13.1.3 Now let h be the height of the tree, then by property 4, at least half the nodes must be black on any simple path from the root to a leaf, not including the root. So the black height of the tree must be at least h/2, thus: n >= 2^bh(x) - 1 >= 2^(h/2) - 1, or: n + 1 >= 2^(h/2), and taking lg of each side: lg(n + 1) >= h/2, or h <= 2lg(n + 1) which is what we want, finishing the proof. Consequently, the dynamic set queries SEARCH, MINIMUM, MAXIMUM, SUCCESSOR, and PREDECESSOR will run in O(lg n) time since they run in O(h) time on a search tree of height h, and any red-black tree with n nodes is a search tree of height O(lg n). Note: TREE-INSERT and TREE-DELETE of Chapter 12 would also run in O(lg n) time, but they would not necessarily preserve the red-black tree properties. However, by being careful, INSERT and DELETE _can_ be made to run in O(lg n) while preserving the red-black tree properties, as will be shown in Sections 13.3 and 13.4. This is done by performing rotations and recoloring nodes, which maintain the red-black tree properties and so keep the tree balanced. 13.2 Rotations 13.2.1 A rotation is a local operation that preserves the binary-search-tree property. Figure 13.2 shows left and right rotations. For a left rotation on a node x, we assume that x.right = y is not T.nil. The rotation "pivots" counter-clockwise around the link from x to y, making y the new root of the subtree and x its left child. Here is the code for LEFT-ROTATE. Figure 13.3 shows how it works. LEFT-ROTATE(T,x) 1 y = x.right // Set y 2 x.right = y.left // Turn y's left 3 if y.left != T.nil //subtree into x's 4 y.left.p = x // right subtree 5 y.p = x.p // Link x's parent to y 6 if x.p == T.nil // x was the root 7 T.root = y 8 else if x == x.p.left 9 x.p.left = y 10 else x.p.right = y 11 y.left = x // Put x on y's left 12 x.p = y The code for RIGHT-ROTATE is symmetric; it and LEFT-ROTATE only change a fixed number of pointers and so they run in O(1) time. 13.3 Insertion 13.3.1 Insertion into an n-node red-black tree can be done in O(lg n) time. We slightly modify TREE-INSERT to insert the node, colored red, then call RB-INSERT-FIXUP to re-establish the red-black tree properties by recolorings and rotations. RB-INSERT(T,z) 1 y = T.nil 2 x = T.root 3 while x != T.nil 4 y = x 5 if z.key < x.key 6 x = x.left 7 else x = x.right 8 z.p = y 9 if y == T.nil 10 T.root = z 11 else if z.key < y.key 12 y.left = z 13 else y.right = z 14 z.left = T.nil 15 z.right = T.nil 16 z.color = RED 17 RB-INSERT-FIXUP(T,z) The 4 modifications are: 1) T.nil replaces NIL, 2) z's children are set to T.nil 3) z is colored RED, 4) RB-INSERT-FIXUP is called. RB-INSERT-FIXUP(T,z) 13.3.2 1 while z.p.color == RED 2 if z.p == z.p.p.left 3 y = z.p.p.right 4 if y.color == RED // Case: 5 z.p.color = BLACK // 1 6 y.color = BLACK // 1 7 z.p.p.color = RED // 1 8 z = z.p.p // 1 9 else if z == z.p.right 10 z = z.p // 2 11 LEFT-ROTATE(T,z) // 2 12 z.p.color = BLACK // 3 13 z.p.p.color = RED // 3 14 RIGHT-ROTATE(T,z.p.p) // 3 15 else (same as "if" clause with "right" and "left" exchanged) 16 T.root.color = BLACK We examine code in three major steps: 1) What violations of red-black tree properties are introduced by RB-INSERT? 2) What is the goal of the while-loop in 1-15? 3) How do the three cases perform fix-up? Figure 13.4 shows a sample fix-up. 1) Properties 1, 3, and 5 are still satisfied but maybe not property 2 (root is BLACK) or property 4 (RED node can't have RED child). 2) The while loop maintains the 13.3.3 following three-part invariant: At the start of each iteration of the loop: a. Node z is red. b. If z.p is the root, then z.p is BLACK. c. There is at most one violation of the red-black properties -- either property 2 or 4. If 2 is violated, it is because z is the root and is red. If 4 is violated it is because both z and z.p are red. To check the invariant, we start with the initialization and termination arguments. In the proof of maintenance, we note that two things can happen: z moves up the tree or some rotations are done and the loop terminates. Initialization: a. When RB-INSERT-FIXUP is called, z is the red node that was added. b. If z.p is the root, then z.p started out black and has not changed. c. If there is a violation of property 2, the red root must be the new node z, the only internal node; and the parent and both children are black (nil), so there is no violation of property 4. If 4 is violated, then since the children of z are black and the tree had no other violations before z was added, the only violation is now: both z and z.p are red. Termination: 13.3.4 The loop terminates when z.p becomes black. (If z is the root, z.p is nil and black.) Thus there is no violation of property 4, so the only violation can be of property 2, which is fixed by line 16. So all red-black properties hold when RB-INSERT-FIXUP ends. Maintenance: There are 6 cases to consider in the while loop, but 3 cases are symmetric to the other 3, depending on whether z's parent z.p is a left or right child of z.p.p, which is determined at line 2. This is major step 3) of our analysis. If z.p is the root, it is black. Since we only enter the loop if z.p is red, we know z.p isn't the root in that case, and so z.p.p exists. We distinguish case 1 from 2 and 3 by the color of z's parent's sibling or "uncle". Line 3 makes y point to z's uncle. Line 4 tests if y is red, & if so, case 1 is done, else we do cases 2 & 3. In each, z.p.p is black, since z.p is red and property 4 is only violated between z and z.p Case 1: z's uncle is red Figure 13.5 shows case 1 (lines 5-8), which is done when both z.p and y are red. Since z.p.p is black, we can color both z.p and y black (fixing the problem of z and z.p being red) and color z.p.p red, and so maintain property 5. The pointer z moved up 2 levels to z.p.p, & the loop is repeated. 13.3.5 Now we show that case 1 maintains the loop invariant. We let z be the node of the current iteration and z' = z.p.p be the node at the beginning of the next iteration. a. Since this iteration colors z.p.p red, z' is red when the next iteration starts. b. z'.p is z.p.p.p in this iteration, and its color doesn't change. If z'.p is the root, it was black before this iteration and remains black. c. We have shown that case 1 maintains property 5, and it doesn't cause violations of properties 1 and 3. If z' is the root at the start of the next iteration, then case 1 just corrected the lone violation of property 4. Since z' is red and is the root, property 2 is the only one violated and is due to z'. If z' is not the root at the start of the next iteration, then case 1 has not created a violation of property 2. Case 1 fixed the lone violation of property 4 that existed at the start of this iteration. It made z' red and left z'.p alone. If z'.p was black, there is no violation of 4. If z'.p was red, coloring z' red created one violation of property 4 between z' & z'.p. Figure 13.6 shows: 13.3.6 Case 2: z's uncle is black & z = a right child Case 3: z's uncle is black & z = a left child Cases 2 and 3 are distinguished by whether z is a left or right child. In case 2, z is a right child and we use a left rotation to transform it into case 3, where z is a left child. Since both z and z.p are red, the rotation doesn't affect the black-height of nodes or property 5. In either case, z's uncle is black, otherwise we would be in case 1. Also z.p.p exists, since it existed before this iteration, and lines 10 and 11 move z up then down one level, so the identity of z.p.p remains unchanged. In case 3, we do some color changes and a rotation, which preserves property 5, and then we are done since there are no longer 2 red nodes in a row. The while loop is not executed again since z.p is now black. Next we show that cases 2 and 3 maintain the loop invariant. a. Case 2 makes z point to z.p which is red. No other change to z occurs in cases 2 & 3. b. Case 3 makes z.p black, so if it is the root at the start of the next iteration, it is black. c. As in case 1, cases 2 and 3 maintain properties 1, 3, and 5. 13.3.7 Since z is not the root in cases 2 and 3, we know property 2 isn't violated. Cases 2 and 3 don't cause a violation of property 2, since the only node made red becomes a child of a black node by the rotation in case 3. Cases 2 and 3 correct the lone violation of property 4 and they do not cause another violation. This finishes the proof. Since we have shown that each iteration of the loop maintains the invariant, we have shown that RB-INSERT-FIXUP correctly restores red-black properties. Analysis Since the height of a red-black tree with n nodes is O(lg n), lines 1-16 of RB-INSERT take O(lg n) time. In RB-INSERT-FIXUP, the loop repeats only in case 1, and then z moves up the tree two levels, so the total number of times the while loop can be executed is O(lg n) also. Thus RB-INSERT takes a total of O(lg n) time. Note: it never performs more than two rotations since the loop terminates if case 2 or case 3 is executed. 13.4 Deletion 13.4.1 Like insertion, deletion from an n-node red-black tree takes O(lg n) time. It's a bit more complicated than insertion. We modify TRANSPLANT to apply to red-black trees: RB-TRANSPLANT(T,u,v) 1 if u.p == T.nil // u is the root 2 T.root = v 3 else if u == u.p.left // u is left child 4 u.p.left = v 5 else u.p.right = v // u is right child 6 v.p = u.p RB-TRANSPLANT differs from TRANSPLANT in two ways: T.nil replaces NIL, and v.p = u.p is done unconditionally in line 6. RB-DELETE is like TREE-DELETE, with added lines, which (1) keep track of a node y that might violate red-black properties, (2) we remember y's color, and (3) we keep track of node x that moves into y's original position. If z has fewer than two children, y becomes z and if z has two children, y becomes z's successor, which moves into z's position. Finally, RB-DELETE-FIXUP is called to change colors and do rotations to restore red-black properties. RB-DELETE(T,z) 13.4.2 1 y = z 2 y-original-color = y.color 3 if z.left == T.nil // case (a) 4 x = z.right 5 RB-TRANSPLANT(T,z,z.right) 6 else if z.right == T.nil // case (b) 7 x = z.left 8 RB-TRANSPLANT(T,z,z.left) 9 else y = TREE-MINIMUM(z.right) 10 y-original-color = y.color 11 x = y.right 12 if y.p == z // case (c) 13 x.p = y 14 else RB-TRANSPLANT(T,y,y.right) 15 y.right = z.right // case (d) step 1 16 y.right.p = y 17 RB-TRANSPLANT(T,z,y) // case (c) 18 y.left = z.left // and case (d) 19 y.left.p = y // step 2 20 y.color = z.color 21 if y-original-color == BLACK 22 RB-DELETE-FIXUP(T,x) In addition to the trivial replacements of NIL by T.nil and TRANSPLANT by RB-TRANSPLANT, TREE-DELETE and RB-DELETE differ as follows: - We maintain y as either z if z had fewer than 2 children, or z's successor otherwise. - We save y's original color, and if 13.4.3 z had 2 children, give y z's color. At the end, if y's original color were BLACK, we call RB-DELETE-FIXUP to fix color problems. - We also maintain x as the node that goes into y's original position, so x.p points to y's original parent, even if x is T.nil. - Finally, if y was black, there may be some violations of red-black properties, so we call RB-DELETE-FIXUP to fix them. If y was red, the red-black properties still hold if y is removed since: 1 No black heights have changed 2 No red nodes have been made adjacent. Because y take z's place and z's color, we cannot have two adjacent red nodes at y's new position. Also, if y was not z's right child, then y's original right child x (which must be black) replaces y. 3 Since y was red, it could not be the root so the root remains black. If y was black, three problems may arise, which RB-DELETE-FIXUP fixes. (1) If y had been the root, and a red child of y becomes the new root, property 2 is violated. (2) If both x and x.p are red property 4 is violated. (3) Moving y may cause some path to have one fewer black node, violating property 5. We can fix the property 5 violation 13.4.4 by saying that x carries an "extra" black. Thus x is "doubly black" or "red-and-black" (and is actually BLACK or RED respectively), but x will _always_ point to the only node that has double coloring. Here is the code for RB-DELETE-FIXUP: RB-DELETE-FIXUP(T,x) 1 while x != T.root and x.color == BLACK 2 if x == x.p.left 3 w = x.p.right 4 if w.color == RED // Case: 5 w.color = BLACK // 1 6 x.p.color = RED // 1 7 LEFT-ROTATE(T,x.p) // 1 8 w = x.p.right // 1 9 if w.left.color == BLACK and w.right.color == BLACK 10 w.color = RED // 2 11 x = x.p // 2 12 else if w.right.color == BLACK 13 w.left.color = BLACK // 3 14 w.color = RED // 3 15 RIGHT-ROTATE(T,w) // 3 16 w = x.p.right // 3 17 w.color = x.p.color // 4 18 x.p.color = BLACK // 4 19 w.right.color = BLACK // 4 20 LEFT-ROTATE(T,x.p) // 4 21 x = T.root // 4 22 else (same as "if" clause but with "right" and "left" exchanged) 23 x.color = BLACK RB-DELETE-FIXUP restores properties 13.4.5 1, 2, and 4. Exercises 13.4-1 and 13.4-2 show it restores 2 and 4; so we focus on property 1. The goal of the while loop in lines 1-22 is to move the extra black up the tree until: 1. x is red-and-black, in which case it is colored (singly) black in line 23, 2. x is the root, in which case the extra black can be simply "removed", or 3. we can do suitable rotations & recolorings. Within the while loop, x always points to the doubly black node. Line 2 determines if x is a left or right child (RB-DELETE-FIXUP shows the code for a left child; the code for a right child, line 22, is symmetric). We maintain w as (a pointer to) the sibling of x. Since x is doubly black, w cannot be T.nil -- otherwise the number of blacks from x.p on paths through x and w would be different. The 4 cases in the code are illustrated in Figure 13.7. We must show property 5 is preserved in each case. So the key idea is to show that the number of black nodes, including x's extra black, in paths from the root shown to the leaves is maintained by the transformation in each case. For example: In Figure 13.7(a) (case 1), the number 13.4.6 of black nodes from the root to subtrees alpha and beta is 3, both before and after the transformation. And the number of black nodes from the root to gamma, delta, epsilon, and zeta is 2 before and after transformation. In order to cut down on the number of cases, in Figure 13.7(b) (case 2), we let the color of the root be c, which can be either red or black, and we let count(c) denote the "black count" of a color: count(RED) = 0, and count(BLACK) = 1. In this case, the number of black nodes from the root to alpha or beta is 2 + count(c) before and after transformation. The same is true for the other subtrees. Now we analyze the cases. Case 1: x's sibling w is red This is the case in Figure 13.7(a) and lines 5-8. Since w must have black children, we can switch the colors of w and x.p and perform a left-rotation on x.p without violating the red-black properties. The new sibling of x, one of w's children before rotation, is black, so we have converted case 1 to case 2, 3, or 4 w is black in Cases 2, 3, and 4, and they are distinguished by the color of w's children. Now we analyze the cases. 13.4.7 Case 2: x's sibling w is black, and both of w's children are black This is the case in Figure 13.7(b) and lines 10-11. Since w is black, we take one black off both x and w, and put it on x.p leaving x with one black and w red; x.p becomes the new x. Note that if we enter case 2 through case 1, the new x is red-and-black, since the original x.p was red. So the color of the new x is red and the loop terminates. The new x is then colored black by line 23 Case 3: x's sibling w is black, and w's left child is red and its right child is black This is the case in Figure 13.7(c) and lines 13-16. We can switch colors of w & w.left, & do a right rotation about w without violating red-black properties. Now we are in case 4. Case 4: x's sibling w is black, and w's right child is red This is the case in Figure 13.7(d) and lines 17-21. By making some color changes and doing a left rotation on x.p we can remove the extra black on x, making it singly black, without violating red-black properties. Then we set x to the root to exit the while loop. Analysis 13.4.8 The running time of RB-DELETE without a call to RB-DELETE-FIXUP is O(lg n) as we saw before in analyzing ordinary binary search trees. In RB-DELETE-FIXUP, cases 1, 3, and 4 each terminate after a constant number of color changes and at most three rotations. Case 2 is the only case in which the while loop can repeat, and then the pointer x moves up the tree at most O(h) = O(lg n) times and no rotations are performed. Thus RB-DELETE-FIXUP takes O(lg n) time and performs at most 3 rotations. So the overall time for RB-DELETE is also O(lg n).