Part V Advanced Data Structures V.1
Introduction
Here we examine dynamic set data structures,
but at a more advanced level than Part III.
Chapter 18 discusses B-trees, balanced trees
designed to be stored on magnetic disks.
Chapters 19 and 20 present binomial heaps and
Fibonacci heaps respectively, which support
the mergeable heap operation of UNION that
unites two heaps in addition to INSERT,
MINIMUM, and EXTRACT-MIN. We use amortized
analysis, which is discussed in Chapter 17.
Fibonacci heaps have the best amortized run
times for its operations, and both kinds of
heaps improve upon the Theta(n) time to do
a union of two binary heaps of Chapter 6.
Chapter 21 discusses data structures for
disjoint sets, a set of n elements initially
in its own singleton set. This structure
supports the UNION and FIND-SET operations.
FIND-SET(x) returns a pointer to the set
containing x. This structure can be
implemented very efficiently, with amortized
run time of O(m alpha(m)) on m operations,
where alpha(n) < 4 for n < 10^80.
Other advanced data structures: V.2
Dynamic trees maintain a forest of disjoint
rooted trees. Each edge has a real-valued
cost. Dynamic trees support queries to find
parents, roots, edge costs, and minimum cost
of a path from a node to the root. Costs
may be updated on such a path, trees may be
linked, and edges removed. They give good
performance in network-flow algorithms.
Splay trees, a form of binary search tree, has
operations that run in amortized time O(lg n)
and are often used to simplify dynamic trees.
Persistent data structures allow queries, and
sometimes updates too, on past versions of a
data structure.
There are some faster implementations that
support the dictionary operations (INSERT,
DELETE, and SEARCH). Exponential search
trees and others give improved bounds on
some or all of the operations. Fusion trees
implement the operations in O(lg n/ lg lg n)
time when the universe of keys is the
integers. Another structure has O(lg lg n)
run time for MINIMUM, MAXIMUM, EXTRACT-MIN,
EXTRACT-MAX, PREDECESSOR, and SUCCESSOR, in
addition to INSERT, DELETE, and SEARCH.
Dynamic graph data structures allow insertion
and deletion of vertices and edges, in
addition to queries.
Chapter 18 B-Trees 18.0.1
B-trees are balanced search trees designed to
work well with direct-access secondary storage
devices - usually magnetic disks. Database
systems often use B-trees or variants.
An n-node B-tree has height Theta(lg n) but
with a large "branching factor", so common
dynamic set operations take O(lg n) time.
Figure 18.1 page 435 shows a sample B-tree.
An internal node x with n[x] keys has n[x] + 1
children, so when we encounter x during a
search, we make an (n[x] + 1)-way decision
based on the n[x] keys.
Section 18.1 defines a B-tree, Section 18.2
shows B-tree search and insertion, and Section
18.3 discusses deletion. But first we discuss
issues involved with data structures stored on
disks
Data structures on secondary storage
Primary memory (or main memory) consists of
silicon memory chips. Secondary storage
usually consists of magnetic disks. It is
usually about 100 times cheaper than chip
memory and there is usually at least 100 times
as much of it.
18.0.2
Figure 18.2(a), page 436, shows a typical
disk drive. It consists of several platters
(the disks) that revolve at a constant speed
around a common spindle. The surface of each
disk is coated with a magnetizable material
which can be read from and written to by
"read" heads at the end of arms. The arms are
ganged together so they move toward and away
from the spindle in unison. If the head is
stationary, it stays over a single track; all
the tracks under the heads form a cylinder as
shown in Figure 18.2(b)
Disks are much slower than main memory, since
they have moving parts. Typically the disks
rotate at 7200 RPM (Revolutions Per Minute),
so one rotation takes 8.33 milliseconds; also
moving the arms takes time, so average access
time is now in the 3 to 9 millisecond range.
Main memory access is about 100 nanoseconds,
so disk access is about 100,000 times slower.
Disks access equal-sized pages of data at a
time - from 2^11 to 2^14 bytes. Due to the
large difference in access time, running time
analysis on data structures using secondary
storage is broken down into two components:
- the number of disk accesses, and
- the CPU (computing) time.
Disk access time for a page is assumed to be
constant (maybe the average time).
18.0.3
In B-tree applications, the data is much
larger than would fit in main memory, so
B-tree algorithms assume that only a (small)
constant number of pages are in main memory at
any one time.
Letting x be a pointer to a data object, we
can refer to key[x] and other fields as usual.
If x is only on disk, we must copy it into
main memory with a DISK-READ(x) operation (if
x is in main memory, DISK-READ(x) is a no-op).
DISK-WRITE(x) is used to save changes in x.
So a typical pattern for working with x is:
x <- a pointer to a data object
DISK-READ(x)
operations that access/modify fields of x
DISK-WRITE(x) |> Omit if x was not changed
operations that access (only) fields of x
To make disk operations efficient, B-tree
nodes are usually as large as a disk page, and
so that is the limit on the number of children
of a node. Branching factors between 50 and
2000 are often used, depending on the size of
the keys. Figure 18.3, page 438, shows that
a B-tree with a branching factor of 1001 and
height 2 can store over one billion keys, yet
only 2 disk accesses are needed to find any
key since the root is always in main memory.
18.1 Definition of B-trees 18.1.1
For simplicity, we assume that satellite data
is stored in the same node as the key and
travels with it if the key is moved. Often in
practice, only a pointer to the satellite data
is kept with the key. A B+tree stores all the
satellite data in leaves and stores only keys
and child pointers in internal nodes.
A B-tree T is a rooted tree satisfying:
1. Every node x has the following fields:
a. n[x], the number of keys currently in x,
b. the keys themselves, in nondecreasing
order key [x] <= key [x] <= .. <= key [x]
1 2 n[x]
c. leaf[x] = TRUE if x is a leaf, else FALSE
2. Each internal node x also contains n[x] + 1
pointers c [x], c [x], ... , c [x]
1 2 n[x]+1
to its children; undefined for leaves.
3. The keys separate the ranges of keys stored
in each subtree: if k_i is any key stored in
the subtree with root c_i[x], then
k <= key [x] <= k <= key [x] <= ...
1 1 2 2
<= key [x] <= k
n[x] n[x]+1
18.1.2
4. All the leaves have the same depth, which
is the tree's height h.
5. There are lower and upper bounds on the
number of keys a node can contain, which are
determined by the minimum degree, t >= 2:
a. Every node except the root must have at
least t - 1 keys, and thus at least t
children. If the tree is nonempty, the
root must have at least one key.
b. Every node can contain at most 2t - 1 keys
and therefore at most 2t children. We say
that a node is full if it contains exactly
2t - 1 keys.
In the simplest B-tree t = 2, and we have 2,
3, or 4 children - a 2-3-4 tree.
The height of a B-tree
The number of disk accesses required for most
B-tree operations is proportional to the
height of the tree. We now analyze the worst-
case height of a B-tree.
Theorem 18.1
If n >= 1, then for any n-key B-tree of height
h and minimum degree t >= 2,
h <= log_t ( (n + 1)/2 )
Proof: 18.1.3
The root contains at least one key and all
other nodes contain at least t - 1 keys. So
there are at least 2 nodes at depth 1, at
least 2t nodes at depth 2, at least 2t^2 nodes
at depth 3, etc., until there are 2*t^(h-1)
nodes at depth h. Figure 18.4, page 440 shows
a tree for h = 3. Thus the number of keys, n,
satisfies the inequality:
h i-1
n >= 1 + (t-1) Sum( 2*t )
i=1
h
= 1 + 2(t-1)*(t -1)/(t-1)
h
= 2*t - 1
Adding 1 to both sides, dividing by 2, and
taking logarithms base t finishes the proof.
Here we see the power of B-trees, as compared
to red-black trees. Though the height of both
trees grows as O(lg n), for B-trees the base
of the logarithm is usually much larger. Thus
B-trees save a factor of about lg(t) in the
number of nodes accessed in common operations
since log_t(n) = lg(n)/lg(t). Consequently,
the number of disk accesses is reduced
substantially.
18.2 Basic operations on B-trees 18.2.1
In this section, we present the details of
B-TREE-SEARCH, B-TREE-CREATE, and
B-TREE-INSERT. We adopt two conventions:
- The root is always in main memory, so that
a DISK-READ of the root is never needed; but
a DISK-WRITE is required whenever the root
is changed.
- Any nodes passed as parameters must already
have had a DISK-READ performed on them.
The procedures are "one pass" algorithms that
proceed downward from the root of the tree.
Searching a B-tree
This is similar to a binary tree, except that
we make a (n[x]+1)-way branching decision. In
B-TREE-SEARCH(x,k), x is the root of a subtree
to be searched, and k is the key value; to
search the tree T: B-TREE-SEARCH(root[T],k).
If k is in the tree, the ordered pair (y,i) is
returned, where y is a node and i is an index
with key_i[y]= k; else NIL is returned.
B-TREE-SEARCH(x,k)
1 i <- 1
2 while i <= n[x] and k > key_i[x]
3 do i <- i + 1
4 if i <= n[x] and k = key_i[x]
5 then return (x,i)
6 if leaf[x]
7 then return NIL
8 else DISK-READ(c_i[x])
9 return B-TREE-SEARCH(c_i[x],k)
18.2.2
In a linear search, lines 1-3 find the
smallest index such that k <= key_i[x] or they
set i to n[x] + 1. Lines 4-5 check to see if
we have found the key, returning if so. Lines
6-9 either terminate the search unsuccessfully
(if x is a leaf) or recurse to search the
appropriate subtree of x after performing the
necessary DISK-READ on that child. The search
path for key R is shown in Figure 18.1.
B-TREE-SEARCH does O(h) = O(log_t(n))
DISK-READ's. Since n[x] < 2t, the while loop
is O(t), so the total CPU time is O(th) =
O(t log_t(n) ).
Creating an empty B-tree
To build a B-tree, we first use B-TREE-CREATE
to create an empty root node, then add new
keys with B-TREE-INSERT. These procedures
both use ALLOCATE-NODE, which allocates one
disk page as a new node in O(1) time. The new
node requires no immediate DISK-WRITE since
there is no useful information in it yet.
B-TREE-CREATE(T)
1 x <- ALLOCATE-NODE()
2 leaf[x] <- TRUE
3 n[x] <- 0
4 DISK-WRITE(x)
5 root[T] <- x
Inserting a key into a B-tree 18.2.3
Inserting into a B-tree is more complicated
than into a binary search tree, since we can't
just create a new leaf node -- which would
violate the B-tree property. If a leaf node
is not full, we can insert the new key. But
if a node y is full (with 2t - 1 keys), we can
split it around its median key key_t[y] into
two nodes with t - 1 keys each; the median key
moves up to y's parent as the dividing value
between the two nodes. But if y's parent is
also full, it would have to be split, and this
could propagate all the way to the root.
We can insert into a B-tree in a single pass
down the tree, but we must split every full
node on the way to avoid the propagation of
splitting up the tree mentioned above.
- Splitting a node in a B-tree
B-TREE-SPLIT-CHILD has 3 arguments: a nonfull
internal node x, an index i, and a child node
y such that y = c_i[x] is full (both nodes are
assumed to be in main memory). The procedure
splits y in two and adjusts x to have one more
child. To split the root, we first make it
the child of a new empty root node and then
call B-TREE-SPLIT-CHILD. The tree height thus
grows by 1, and this is the only way it can
grow. Figure 18.5, page 444, illustrates the
splitting process.
B-TREE-SPLIT-CHILD(x,i,y) 18.2.4
1 z <- ALLOCATE-NODE()
2 leaf[z] <- leaf[y]
3 n[z] <- t - 1
4 for j <- 1 to t - 1
5 do key [z] <- key [y]
j j+t
6 if not leaf[y]
7 then for j <- 1 to t
8 do c [z] <- c [y]
j j+t
9 n[y] <- t - 1
10 for j <- n[x] + 1 downto i + 1
11 do c [x] <- c [x]
j+1 j
12 c [x] <- z
i+1
13 for j <- n[x] downto i
14 do key [x] <- key [x]
j+1 j
15 key [x] <- key [y]
i t
16 n[x] <- n[x] + 1
17 DISK-WRITE(y)
18 DISK-WRITE(z)
19 DISK-WRITE(x)
Lines 1-8 copy the larger t - 1 keys and t
children from y to z, and line 9 adjusts y's
key count. Lines 10-16 insert z as a child of
x, moving the median key up from y & adjusting
x's key count. Lines 17-19 write disk pages.
The procedure B-TREE-SPLIT-CHILD 18.2.5
takes Theta(t) CPU time due to lines 4-5; the
other loops take O(t) CPU time. The procedure
also performs Theta(1) disk operations.
- Inserting a key into a B-tree is done by
B-TREE-INSERT in a single pass down the tree,
requiring Theta(h) disk accesses, and
O(t log_t(n)) CPU time. It avoids inserting
into a full child by using B-TREE-SPLIT-CHILD.
B-TREE-INSERT(T,k)
1 r <- root[T]
2 if n[r] = 2t - 1
3 then s <- ALLOCATE-NODE()
4 root[T] <- s
5 leaf[s] <- FALSE
6 n[s] <- 0
7 c_1[s] <- r
8 B-TREE-SPLIT-CHILD(s,1,r)
9 B-TREE-INSERT-NONFULL(s,k)
10 else B-TREE-INSERT-NONFULL(r,k)
If the root is full, lines 3-9 split it and a
new node s (with 2 children) becomes the root.
This is the only way to increase the height of
a B-tree, and is illustrated by Figure 18.6,
page 446. B-TREE-INSERT finishes by calling
B-TREE-INSERT-NONFULL to insert key k into a
nonfull node. B-TREE-INSERT-NONFULL recurses
down the tree, ensuring that the node to which
it recurses is nonfull by calling
B-TREE-SPLIT-CHILD as necessary.
B-TREE-INSERT-NONFULL inserts key k 18.2.6
into node x, which is assumed to be nonfull --
guaranteed by the operation of B-TREE-INSERT
and the operation of B-TREE-INSERT-NONFULL.
B-TREE-INSERT-NONFULL(x,k)
1 i <- n[x]
2 if leaf[x]
3 then while i >= 1 and k < key [x]
i
4 do key [x] <- key [x]
i+1 i
5 i <- i - 1
6 key [x] <- k
i+1
7 n[x] <- n[x] + 1
8 DISK-WRITE(x)
9 else while i >= 1 and k < key [x]
10 do i <- i - 1 i
11 i <- i + 1
12 DISK-READ(c [x])
i
13 if n[c [x]] = 2t - 1
i
14 then B-TREE-SPLIT-CHILD(x,i,c [x])
15 if k > key [x] i
i
16 then i <- i + 1
17 B-TREE-INSERT-NONFULL(c [x],k)
i
B-TREE-INSERT-NONFULL works as 18.2.7
follows. Lines 3-8 handle the case when x is
a leaf and x is inserted into it. If x is not
a leaf, we insert k into a leaf node in the
subtree rooted at x. Lines 9-11 determine the
child of x to which the recursion descends.
Line 13 tests whether the child is full; if so
line 14 splits it, and lines 15-16 determine
which of the two new children to descend to.
(Note: we don't need a DISK-READ(c_i[x]) after
line 16, since the recursion descends to the
child just created by B-TREE-SPLIT-CHILD.)
The net effect of line 13-16 is to ensure that
we never recurse to a full node. Line 17 then
recurses to insert k into the correct subtree.
Figure 18.7, page 448, shows various cases of
insertion into a B-tree: case (a) shows the
initial tree, case (b) shows insertion into a
nonfull leaf, case (c) shows insertion into a
full leaf, case (d) shows insertion when the
root is full, splitting it, and case (e) shows
another insertion that splits a leaf.
B-TREE-INSERT does Theta(h) DISK-READ's for a
tree of height h, and O(h) DISK-WRITE's. The
total CPU time used is O(th) = O(t log_t(n)).
Since B-TREE-INSERT-NONFULL is tail-recursive,
it can be implemented as a while loop, showing
that the number of pages needed in main
memory at any one time is O(1).
18.3 Deleting a key from a B-tree 18.3.1
Deletion is a bit more complicated than
insertion into a B-tree, since the deleted key
can be in an internal node (not just a leaf),
requiring rearranging of the node's children.
Also we have to ensure that a node doesn't get
too small (except the root, which can have any
number of children up to 2t). A simple
approach might have to back up if the node
from which the key was deleted had the minimum
number of keys.
We will design B-TREE-DELETE so that when it
is called recursively on a node x, the number
of keys in x is at least t. So sometimes a
key may have to be moved to a child before
recursion can descend to that child. This
allows us to delete a key in one downward pass
without having to "back up" (with 1 exception,
explained later). In the specification below
for deletion, we assume that if the last key
is removed from the root node x (which can
occur in cases 2c and 3b), then x is deleted
and x's only child c_1[x] becomes the new root
(decreasing the height of the tree by one).
We sketch below how deletion works; k is the
key. Figure 18.8, pages 450-451 illustrates
various cases of deleting keys from a B-tree.
18.3.2
1. If k is in a leaf node x, delete k from x.
2. If k is in an internal node x, do this:
a. If the child y that precedes k in x has
at least t keys, find the predecessor k'
of k in the subtree rooted at y.
Recursively delete k', and replace k by k'
in x. (Finding and deleting k' can be
performed in a single downward pass.)
b. Symmetrically, if the child z following k
in x has at least t keys, then find the
successor k' of k in the subtree rooted at
z. Recursively delete k', and replace k
by k' in x. (Finding and deleting k' can
be performed in a single downward pass.)
c. Otherwise, if both y and z have only
t - 1 keys, merge k and all of z into y,
so that x loses both k and the pointer to
z, and y now contains 2t - 1 keys. Then
free z and recursively delete k from y.
3. If k is not in internal node x, determine
the root c_i[x] of the subtree that must
contain k, if k is in the tree at all. If
c_i[x] has only t - 1 keys, execute steps 3a
and 3b as needed to ensure that we descend to
a node with at least t keys. Then finish by
recursing on the appropriate child of x.
a. If c_i[x] has only t - 1 keys but has an
immediate sibling with at least t keys,
give c_i[x] an extra key by moving a key
down from x into c_i[x], moving a key from
the sibling up into x, and moving the
appropriate child pointer from the sibling
into c_i[x].
18.3.3
b. If c_i[x] and both its immediate siblings
have only t - 1 keys, merge c_i[x] with
one sibling, which involves moving a key
from x down into the new merged node to
become the median key for that node.
Since most of the keys in a B-tree are in the
leaves, we expect that most of the deletions
take place there. In that case, deletion acts
in one downward pass, without having to back
up. When deleting a key k from an internal
node, the procedure makes a downward pass but
then returns back to the node from which key k
was deleted to replace k with its successor or
predecessor (cases 2a and 2b).
As with insertion, deletion does Theta(h)
DISK-READ's and O(h) DISK-WRITE's. Also, as
with insertion, the total CPU time used is
O(th) = O(t log_t(n)).