Part V Advanced Data Structures V.1 Introduction Here we examine dynamic set data structures, but at a more advanced level than Part III. Chapter 18 discusses B-trees, balanced trees designed to be stored on magnetic disks. Chapter 19 discusses Fibonacci heaps, which supports the mergeable heap operation UNION that unites two heaps in addition to INSERT, MINIMUM, and EXTRACT-MIN. Fibonacci heaps have the best amortized run times for its operations, matching or improving on the times for binary heaps of Chapter 6. Chapter 20 discusses van Emde Boas trees which have unique keys in the range {0,1,2,..,u-1} where u = 2^k, and the dynamic-set operations run in O(lg lg u) time. Chapter 21 discusses data structures for disjoint sets, a set of n elements initially in its own singleton set. This structure supports the UNION and FIND-SET operations. FIND-SET(x) returns a pointer to the set containing x. This structure can be implemented very efficiently, with amortized run time of O(m alpha(m)) on m operations, where alpha(n) < 4 for n < 10^80. Other advanced data structures: V.2 Dynamic trees maintain a forest of disjoint rooted trees. Each edge has a real-valued cost. Dynamic trees support queries to find parents, roots, edge costs, and minimum cost of a path from a node to the root. Costs may be updated on such a path, trees may be linked, and edges removed. Splay trees, a form of binary search tree, has operations that run in amortized time O(lg n) and are often used to simplify dynamic trees. Persistent data structures allow queries and updates on past versions of data structures. There are some faster implementations that support the dictionary operations (INSERT, DELETE, and SEARCH). Exponential search trees and others give improved bounds on some or all of the operations. Fusion trees implement the operations in O(lg n/ lg lg n) time when the universe of keys is the integers. Another structure has O(lg lg n) run time for MINIMUM, MAXIMUM, EXTRACT-MIN, EXTRACT-MAX, PREDECESSOR, and SUCCESSOR, in addition to INSERT, DELETE, and SEARCH. Dynamic graph data structures allow insertion and deletion of vertices and edges, in addition to queries. Chapter 18 B-Trees 18.0.1 B-trees are balanced search trees designed to work well with direct-access secondary storage devices - usually magnetic disks. Database systems often use B-trees or variants. An n-node B-tree has height Theta(lg n) but with a large "branching factor", so common dynamic set operations take O(lg n) time. Figure 18.1 page 485 shows a sample B-tree. An internal node x with x.n keys has x.n + 1 children, so when we encounter x during a search, we make an (x.n + 1)-way decision based on the x.n keys. Section 18.1 defines a B-tree, Section 18.2 shows B-tree search and insertion, and Section 18.3 discusses deletion. But first we discuss issues involved with data structures stored on disks Data structures on secondary storage Primary memory (or main memory) consists of silicon memory chips. Secondary storage usually consists of magnetic disks. It is usually at least 10 times cheaper than chip memory and there is usually at least 100 times as much of it. 18.0.2 Figure 18.2(a) shows a typical disk drive. It consists of several platters (the disks) that revolve at a constant speed around a common spindle. The surface of each disk is coated with a magnetizable material which can be read from and written to by "read" heads at the end of arms. The arms are ganged together so they move toward and away from the spindle in unison. If the head is stationary, it stays over a single track; all the tracks under the heads form a cylinder as shown in Figure 18.2(b) Disks are much slower than main memory, since they have moving parts. Typically the disks rotate at 7200 RPM (Revolutions Per Minute), so one rotation takes 8.33 milliseconds; also moving the arms takes time, so average access time is now in the 8 to 11 millisecond range. Main memory access is about 50 nanoseconds, so disk access is at least 100,000 times slower. Disks access equal-sized pages of data at a time - from 2^11 to 2^14 bytes. Due to the large difference in access time, running time analysis on data structures using secondary storage is broken down into two components: - the number of disk accesses, and - the CPU (computing) time. Disk access time for a page is assumed to be constant (maybe the average time). 18.0.3 In B-tree applications, the data is much larger than would fit in main memory, so B-tree algorithms assume that only a (small) constant number of pages are in main memory at any one time. Letting x be a pointer to a data object, we can refer to x.key and other fields as usual. If x is only on disk, we must copy it into main memory with a DISK-READ(x) operation (if x is in main memory, DISK-READ(x) is a no-op). DISK-WRITE(x) is used to save changes in x. So a typical pattern for working with x is: x = a pointer to a data object DISK-READ(x) operations that access/modify fields of x DISK-WRITE(x) // Omit if x was not changed operations that access (only) fields of x To make disk operations efficient, B-tree nodes are usually as large as a disk page, and so that is the limit on the number of children of a node. Branching factors between 50 and 2000 are often used, depending on the size of the keys. Figure 18.3 shows that a B-tree with a branching factor of 1001 and height 2 can store over one billion keys, yet only 2 disk accesses are needed to find any key, since the root is always in main memory. 18.1 Definition of B-trees 18.1.1 For simplicity, we assume that satellite data is stored in the same node as the key and travels with it if the key is moved. Often in practice, only a pointer to the satellite data is kept with the key. A B+tree stores all the satellite data in leaves and stores only keys and child pointers in internal nodes. A B-tree T is a rooted tree satisfying: 1. Every node x has the following fields: a. x.n, the number of keys currently in x, b. the keys themselves, in nondecreasing order x.key <= x.key <= .. <= x.key 1 2 x.n c. x.leaf = TRUE if x is a leaf, else FALSE 2. Each internal node x also contains x.n + 1 pointers x.c , x.c , ... , x.c 1 2 x.n+1 to its children; undefined for leaves. 3. The keys separate the ranges of keys stored in each subtree: if k_i is any key stored in the subtree with root x.c_i, then k <= x.key <= k <= x.key <= ... <= 1 1 2 2 <= x.key <= k x.n x.n+1 18.1.2 4. All the leaves have the same depth, which is the tree's height h. 5. There are lower and upper bounds on the number of keys a node can contain, which are determined by the minimum degree, t >= 2: a. Every node except the root must have at least t - 1 keys, and thus at least t children. If the tree is nonempty, the root must have at least one key. b. Every node can contain at most 2t - 1 keys and therefore at most 2t children. We say that a node is full if it contains exactly 2t - 1 keys. In the simplest B-tree t = 2, and we have 2, 3, or 4 children - a 2-3-4 tree. The height of a B-tree The number of disk accesses required for most B-tree operations is proportional to the height of the tree. We now analyze the worst- case height of a B-tree. Theorem 18.1 If n >= 1, then for any n-key B-tree of height h and minimum degree t >= 2, h <= log_t ( (n + 1)/2 ) Proof: 18.1.3 The root contains at least one key and all other nodes contain at least t - 1 keys. So there are at least 2 nodes at depth 1, at least 2t nodes at depth 2, at least 2t^2 nodes at depth 3, etc., until there are 2*t^(h-1) nodes at depth h. Figure 18.4 shows a tree for h = 3. Thus the number of keys, n, satisfies the inequality: h i-1 n >= 1 + (t-1) Sum( 2*t ) i=1 h = 1 + 2(t-1)*(t -1)/(t-1) h = 2*t - 1 Adding 1 to both sides, dividing by 2, and taking logarithms base t finishes the proof. Here we see the power of B-trees, as compared to red-black trees. Though the height of both trees grows as O(lg n), for B-trees the base of the logarithm is usually much larger. Thus B-trees save a factor of about lg(t) in the number of nodes accessed in common operations since log_t(n) = lg(n)/lg(t). Consequently, the number of disk accesses is reduced substantially. 18.2 Basic operations on B-trees 18.2.1 In this section, we present the details of B-TREE-SEARCH, B-TREE-CREATE, and B-TREE-INSERT. We adopt two conventions: - The root is always in main memory, so that a DISK-READ of the root is never needed; but a DISK-WRITE is required whenever the root is changed. - Any nodes passed as parameters must already have had a DISK-READ performed on them. The procedures are "one pass" algorithms that proceed downward from the root of the tree. Searching a B-tree This is similar to a binary tree, except that we make a (x.n+1)-way branching decision. In B-TREE-SEARCH(x,k), x is the root of a subtree to be searched, and k is the key value; to search the tree T: B-TREE-SEARCH(T.root,k). If k is in the tree, the ordered pair (y,i) is returned, where y is a node and i is an index with y.key_i= k; else NIL is returned. B-TREE-SEARCH(x,k) 1 i = 1 2 while i <= x.n and k > x.key_i 3 i = i + 1 4 if i <= x.n and k == x.key_i 5 return (x,i) 6 if x.leaf 7 return NIL 8 else DISK-READ(x.c_i) 9 return B-TREE-SEARCH(x.c_i,k) In a linear search, lines 1-3 find 18.2.2 the smallest index such that k <= x.key_i or set i to x.n + 1. Lines 4-5 check to see if we have found the key, returning if so. Lines 6-9 either terminate the search unsuccessfully (if x is a leaf) or recurse to search the appropriate subtree of x after performing the necessary DISK-READ on that child. The search path for key R is shown in Figure 18.1. B-TREE-SEARCH does O(h) = O(log_t(n)) DISK-READ's. Since x.n < 2t, the while loop is O(t), so the total CPU time is O(th) = O(t log_t(n) ). Creating an empty B-tree To build a B-tree, we first use B-TREE-CREATE to create an empty root node, then add new keys with B-TREE-INSERT. These procedures both use ALLOCATE-NODE, which allocates one disk page as a new node in O(1) time. The new node requires no immediate DISK-WRITE since there is no useful information in it yet. B-TREE-CREATE(T) 1 x = ALLOCATE-NODE() 2 x.leaf = TRUE 3 x.n = 0 4 DISK-WRITE(x) 5 T.root = x Inserting a key into a B-tree 18.2.3 Inserting into a B-tree is harder than into a binary search tree, since we can't just create a new leaf node -- it would violate the B-tree property. If a leaf node is not full, we can insert the new key. But if a node y is full, we can split it around its median key y.key_t into 2 nodes with t - 1 keys; the median key moves up to y's parent as the dividing value between the two nodes. But if y's parent is also full, it would have to be split, and this could propagate all the way to the root. We can insert into a B-tree in a single pass down the tree, but we must split every full node on the way to avoid the propagation of splitting up the tree mentioned above. - Splitting a node in a B-tree B-TREE-SPLIT-CHILD has 2 arguments: a nonfull internal node x and an index i such that x.c_i is full (both nodes are assumed to be in main memory). The procedure splits x.c_i in two & adjusts x to have one more child. To split the root, we make it the child of a new empty root node, then call B-TREE-SPLIT-CHILD. The tree height thus grows by 1 - this is the only way it can grow. Figure 18.5 illustrates the splitting process. B-TREE-SPLIT-CHILD(x,i) 18.2.4 1 z = ALLOCATE-NODE() 2 y = x.c i 3 z.leaf = y.leaf 4 z.n = t - 1 5 for j = 1 to t - 1 6 z.key = y.key j j+t 7 if not y.leaf 8 for j = 1 to t 9 z.c = y.c j j+t 10 y.n = t - 1 11 for j = x.n + 1 downto i + 1 12 x.c = x.c j+1 j 13 x.c = z i+1 14 for j = x.n downto i 15 x.key = x.key j+1 j 16 x.key = y.key i t 17 x.n = x.n + 1 18 DISK-WRITE(y) 19 DISK-WRITE(z) 20 DISK-WRITE(x) Lines 1-9 copy the larger t - 1 keys and t children from y to z, and line 10 adjusts y's key count. Lines 11-17 insert z as a child of x, moving the median key up from y & adjusting x's key count. Lines 18-20 write disk pages. The procedure B-TREE-SPLIT-CHILD 18.2.5 takes Theta(t) CPU time due to lines 5-6; the other loops take O(t) CPU time. The procedure also performs Theta(1) disk operations. - Inserting a key into a B-tree is done by B-TREE-INSERT in a single pass down the tree, requiring Theta(h) disk accesses, and O(t log_t(n)) CPU time. It avoids inserting into a full child by using B-TREE-SPLIT-CHILD. B-TREE-INSERT(T,k) 1 r = T.root 2 if r.n == 2t - 1 3 s = ALLOCATE-NODE() 4 T.root = s 5 s.leaf = FALSE 6 s.n = 0 7 s.c_1 = r 8 B-TREE-SPLIT-CHILD(s,1) 9 B-TREE-INSERT-NONFULL(s,k) 10 else B-TREE-INSERT-NONFULL(r,k) If the root is full, lines 3-9 split it and a new node s (with 2 children) becomes the root. This is the only way to increase the height of a B-tree, and is illustrated by Figure 18.6, page 496. B-TREE-INSERT finishes by calling B-TREE-INSERT-NONFULL to insert key k into a nonfull node. B-TREE-INSERT-NONFULL recurses down the tree, ensuring that the node to which it recurses is nonfull by calling B-TREE-SPLIT-CHILD as necessary. B-TREE-INSERT-NONFULL inserts key k 18.2.6 into node x, which is assumed to be nonfull -- guaranteed by the operation of B-TREE-INSERT and the operation of B-TREE-INSERT-NONFULL. B-TREE-INSERT-NONFULL(x,k) 1 i = x.n 2 if x.leaf 3 while i >= 1 and k < x.key i 4 x.key = x.key i+1 i 5 i = i - 1 6 x.key = k i+1 7 x.n = x.n + 1 8 DISK-WRITE(x) 9 else while i >= 1 and k < x.key 10 i = i - 1 i 11 i = i + 1 12 DISK-READ(x.c ) i 13 if x.c .n == 2t - 1 i 14 B-TREE-SPLIT-CHILD(x,i) 15 if k > x.key i 16 i = i + 1 17 B-TREE-INSERT-NONFULL(x.c ,k) i B-TREE-INSERT-NONFULL works as 18.2.7 follows. Lines 3-8 handle the case when x is a leaf and x is inserted into it. If x is not a leaf, we insert k into a leaf node in the subtree rooted at x. Lines 9-11 determine the child of x to which the recursion descends. Line 13 tests whether the child is full; if so line 14 splits it, and lines 15-16 determine which of the two new children to descend to. (Note: we don't need a DISK-READ(x.c_i) after line 16, since the recursion descends to the child just created by B-TREE-SPLIT-CHILD.) The net effect of line 13-16 is to ensure that we never recurse to a full node. Line 17 then recurses to insert k into the correct subtree. Figure 18.7, page 498, shows various cases of insertion into a B-tree: case (a) shows the initial tree, case (b) shows insertion into a nonfull leaf, case (c) shows insertion into a full leaf, case (d) shows insertion when the root is full, splitting it, and case (e) shows another insertion that splits a leaf. B-TREE-INSERT does Theta(h) DISK-READ's for a tree of height h, and O(h) DISK-WRITE's. The total CPU time used is O(th) = O(t log_t(n)). Since B-TREE-INSERT-NONFULL is tail-recursive, it can be implemented as a while loop, showing that the number of pages needed in main memory at any one time is O(1). 18.3 Deleting a key from a B-tree 18.3.1 Deletion is a bit more complicated than insertion into a B-tree, since the deleted key can be in an internal node (not just a leaf), requiring rearranging of the node's children. Also we have to ensure that a node doesn't get too small (except the root, which can have any number of children up to 2t). A simple approach might have to back up if the node from which the key was deleted had the minimum number of keys. We will design B-TREE-DELETE so that when it is called recursively on a node x, the number of keys in x is at least t. So sometimes a key may have to be moved to a child before recursion can descend to that child. This allows us to delete a key in one downward pass without having to "back up" (with 1 exception, explained later). In the specification below for deletion, we assume that if the last key is removed from the root node x (which can occur in cases 2c and 3b), then x is deleted and x's only child x.c_1 becomes the new root (decreasing the height of the tree by one). We sketch below how deletion works; k is the key. Figure 18.8, pages 500-501 illustrates various cases of deleting keys from a B-tree. 18.3.2 1. If k is in a leaf node x, delete k from x. 2. If k is in an internal node x, do this: a. If the child y that precedes k in x has at least t keys, find the predecessor k' of k in the subtree rooted at y. Recursively delete k', and replace k by k' in x. (Finding and deleting k' can be performed in a single downward pass.) b. Symmetrically, if the child z following k in x has at least t keys, then find the successor k' of k in the subtree rooted at z. Recursively delete k', and replace k by k' in x. (Finding and deleting k' can be performed in a single downward pass.) c. Otherwise, if both y and z have only t - 1 keys, merge k and all of z into y, so that x loses both k and the pointer to z, and y now contains 2t - 1 keys. Then free z and recursively delete k from y. 3. If k is not in internal node x, determine the root x.c_i of the subtree that must contain k, if k is in the tree at all. If x.c_i has only t - 1 keys, execute steps 3a and 3b as needed to ensure that we descend to a node with at least t keys. Then finish by recursing on the appropriate child of x. a. If x.c_i has only t - 1 keys but has an immediate sibling with at least t keys, give x.c_i an extra key by moving a key down from x into x.c_i, moving a key from the sibling up into x, and moving the appropriate child pointer from the sibling into x.c_i. 18.3.3 b. If x.c_i and both its immediate siblings have only t - 1 keys, merge x.c_i with one sibling, which involves moving a key from x down into the new merged node to become the median key for that node. Since most of the keys in a B-tree are in the leaves, we expect that most of the deletions take place there. In that case, deletion acts in one downward pass, without having to back up. When deleting a key k from an internal node, the procedure makes a downward pass but then returns back to the node from which key k was deleted to replace k with its successor or predecessor (cases 2a and 2b). As with insertion, deletion does Theta(h) DISK-READ's and O(h) DISK-WRITE's. Also, as with insertion, the total CPU time used is O(th) = O(t log_t(n)).