Assignment 2: B+ Tree Assignment
Due: Tuesday, November 2, 1999
Instructor: Rich Maclin
In this assignment, you will implement a B+ tree in which leaf level pages contain entries of the form <key, rid of a data record> (Alternative 2 for data entries, in terms of the textbook.) You must implement the full search and insert algorithms as discussed in class. In particular, your insert routine must be capable of dealing with overflows (at any level of the tree) by splitting pages; as per the algorithm discussed in class, you will not consider re-distribution. For this assignment, you can deal with deletes by simply marking the corresponding leaf entry as `deleted'; you do not have to implement merging.
Your page classes will be based on the structures HFPage and SortedPage. The structure files for these classes can be found in:
SortedPage is derived from HFPage, and it augments the insertRecord method of HFPage by storing records on the HFPage in sorted order by a specified key value. The key value must be included as the initial part of each record, to enable easy comparison of the key value of a new record with the key values of existing records on a page. The documentation available in the header files is sufficient to understand what operation each function performs.
You need to implement two page-level classes, BTIndexPage and BTLeafPage, both of which are derived from SortedPage. These page classes are used to build the B+ tree index; you will write code to create, destroy, open and close a B+ tree index, and to open scans that return all data entries (from the leaf pages) which satisfy some range selection on the keys.
You will carry out this assignment in teams with the same partners as in the previous assignment.
You will need to copy files from the src directory for this assignment. To do this you need to follow the same steps as in the previous assignment:
/usr/local/minibase/mini_hwk/assign/BTree/srcto your work directory.
make setupwhich will copy the appropriate files.
The files found in src include:
You can find other useful include files bt.h, hfpage.h, sorted_page.h, index.h,test_driver.h, btree_driver.h, minirel.h and new_error.h in /usr/local/minibase/minibase-2.0/include.
You should begin by (re-)reading the chapter Tree Structured Indexing of the textbook to get an overview of the B+ tree layer.
You should note that key values are passed to functions using void* pointers (pointing to the key values). The contents of a key should be interpreted using the AttrType variable. The key can be either a string(attrString) or an integer(attrInteger), as per the definition of AttrType in minirel.h. We just implement these two kinds of keys in this assignment. If the key is a string, it has a fixed maximum length, MAX_KEY_SIZE1, defined in bt.h.
Although the specifications for some methods (e.g., the constructor of BTreeFile) suggest that keys can be of (the more general enumerated) type AttrType, you can return an error message if the keys are not of type attrString or attrInteger.
The SortedPage class, which augments the insertRecord method of HFPage by storing records on a page in sorted order according to a specified key value, assumes that the key value is included as the initial part of each record, to enable easy comparison of the key value of a new record with the key values of existing records on a page.
These classes are summarized in Figure 1. Note again that you must not add any private data members to BTIndexPage or BTLeafPage.
For further details about the individual methods in these classes, look at the header pages for the class.
We will assume here that everyone understands the concept of B+ trees, and the basic algorithms, and concentrate on explaining the design of the C++ classes that you will implement.
A BTreeFile will contain a header page and a number of BTIndexPages and BTLeafPages. The header page is used to hold information about the tree as a whole, such as the page id of the root page, the type of the search key, the length of the key field(s) (which has a fixed maximum size in this assignment), etc. When a B+ tree index is opened, you should read the header page first, and keep it pinned until the file is closed. Given the name of the B+ tree index file, how can you locate the header page? The DB class has a method
Status add_file_entry(const char* fname, PageId header_page_num);that lets you register this information when a file fname is created. There are methods for deleting and reading these `file entries' (<file name, header page> pairs) as well, which can be used when the file is destroyed or opened. The header page contains the page id of the root of the tree, and every other page in the tree is accessed through the root page.
Figure 2 shows what a BTreeFile with only one BTLeafPage looks like; the single leaf page is also the root. Note that there is no BTIndexPage in this case. Figure 3 shows a tree with a few BTLeafPages, and this can easily be extended to contain multiple levels of BTIndexPages as well.
A BTree is one particular type of index. There are other types, for example a Hash index. However, all index types have some basic functionality in common. We've taken this basic index functionality and created a virtual base class called IndexFile. You won't write any code for IndexFile. However, any class derived from an IndexFile should support ~IndexFile(), Delete(), and insert(). (IndexFile and IndexFileScan are defined in index.h).
Likewise, an IndexFileScan is a virtual base class that contains the basic functionality all index file scans should support.
The main class to be implemented for this assignment is BTreeFile. BTreeFile is a derived class of the IndexFile class, which means a BTreeFile is a kind of IndexFile. However, since IndexFile is a virtual base class all of the methods associated with IndexFile must be implemented for BTreeFile. You should have copied btfile.h into your directory, as per the instructions in the "Getting Started" Section.
The methods to be implemented include:
If a page overflows (i.e., no space for the new entry), you should split the page. You may have to insert additional entries of the form <key, id of child page> into the higher level index pages as part of a split. Note that this could recursively go all the way up to the root, possibly resulting in a split of the root node of the B+ tree.
Finally, you will implement scans that return data entries from the leaf pages of the tree. You should create the scan through a member of BTreeFile, so that you can report an error if a BTreeFile is closed before a scan is completed.
Note that BTreeFileScans should support several kinds of range selections. These ranges are described in btfile.h.
Extra credit of up to 15 points is available. However, the maximum number of points that you can score on this assignment is 80. (So the 15 extra points could be used partially to offset points you lose elsewhere on the assignment.) The main motivation for trying these additional challenges should be the opportunity to write more complete software and understand some of the finer points, rather than to score more points. Do not start on this until you have completed the basic assignment!
The tasks are listed below with the points they are assigned: