CS 4611 (Spring 2001) Program 5

Computer Science 4611
Database Management Systems

Programming Assignment 5
A Simple Relational Algebra Query Language (85 points)
Due Friday, May 4 2001 by 3:30 pm NO LATE PROGRAMS ALLOWED!

Introduction

In programming assignments 2-5, you will implement a simple DBMS. For an overview of how you will construct the DBMS, read this page. For this assignment you will implement some of the query oriented operations of the query language provided for you. You will use the previously provided code for the Disk Space Manager, your code for the Buffer Manager layer, your code for Heap Files, and your code for the System Catalog from previous assignments.

You can find your team assignment here. You should begin by reviewing the book information on implementing query operators. This material has also be covered in class. Most of the code to integrate the query language will be based on code you developed in the Catalog part of this lab. You will need to complete two functions: QLDeleteStatementNode::evaluate and QLQueryItemStatementNode::evaluate which can be found in the file ql_evaluate.C.

To make a local copy of the code you need to implement you should download the file raq.tar.Z. This is a tared archive file. To unpack this file you should do the following:

  % uncompress raq.tar
  % tar xvf raq.tar

This will create a directory RAQueryLanguage that contains the code provided to you as well as skeletons for the code you need to write. This code comes with a provided make file makefile. It is unlikely you will need to modify this file (the only file(s) you should need to change are ql_evaluate.C and possibly ql_parse.h). The directory also includes a number of files that you will likely not need to change (though you may adjust things as you need). These will be discussed below. The code also comes with a testing program ql_tester. This test program will parse a series of query statements in a language discussed below and then attempt to evaluate those statements. When running the program you will type:

  ql_tester QUERY_FILE

If you want to preserve the changes resulting from your queries you should add the name of the database at the end of this line:

  ql_tester QUERY_FILE DATABASE_FILE

This test program is automatically constructed by the makefile by simply typing "make". To recompile all of the code you first type "make clean" which will eliminate all current .o and executable files and then type "make" again to recompile. Note that you will have to copy your buf.h, buf.C, hfpage.h, hfpage.C, heapfile.h, heapfile.C, scan.h, scan.C, attrcatalog.h, attrcatalog.C, relcatalog.h, relcatalog.C, and attrcatalog.o files to this directory for it to compile.

For those who did not get the first two parts of the project working, you can download the archive raqa.tar.Z. This has three libraries with working copies of the earlier code.

A Simple Query Language

Our test program will parse files consisting of a series of query statements. The file ql_batter in the provided directories includes a sample query file using the language described below. The parser will not attempt to evaluate any query until all of the query statements have been parsed. Once the parser has parsed the entire file, it will print its version of the queries parsed and then attempt to evaluate each query.

A statement in our query language will come from the following and end with a semi-colon (;):

SHOW CATALOG
Print the contents of the catalog relations.
SHOW RELATION RelName
Print the contents of the relation RelName.
CREATE RELATION RelName ( Name1 : TYPE1, Name2 : TYPE2, ... )
Create the relation named RelName. The attribute names and their types are listed between the parentheses. The possible types are INT, FLOAT or STRING(Number) where Number is a positive integer indicating the size of the string.
INSERT INTO RelName FROM "FileName"
Load the relation RelName with tuples from the file FileName.
INSERT INTO RelName ( Value1, Value2, ... )
Insert a tuple with values Value1, Value2, ... into relation RelName. Note that the values must be provided in order and have appropriate types for inserting into this relation.
DELETE (Condition) FROM RelName
Delete all of the tuples from RelName where condition Condition is true.
A statement may also consist of a recursively defined query involving SELECT, PROJECT and JOIN statements (and corresponding conditions). Query items are constructed recursively using the following possible components:
- SELECT (Condition) (QueryItem)
  Select those tuples from the query QueryItem that match the condition.
- PROJECT (RV1.AN1,RV2.AN2,...) (QueryItem)
  Project the attributes RV1.AN1, RV2.AN2, ... from those tuples produced by QueryItem. The RV values are relation variables introduced as shown below. The AN values are attribute names.
- JOIN (Condition) (QueryItem1) (QueryItem2)
  Join those tuples produced by QueryItem1 and QueryItem2 where the condition Condition is true. Note, the resulting tuples should include all of the attributes from both relations.
- [ RelationName RelationVariable ]
  Defines the relation variable RelationVariable which is connected to the relation RelationName.

The conditions used in DELETE, SELECT and JOIN conditions are also recursively defined using various components. A condition may be:

An Integer, Float or String Literal (as in 7, 6.6 or "John")
The value of RelationVariable.AttributeName (as in P.Name)
Any condition contained in parentheses: (Condition)
A two operand operator of the form: LeftArgument OP RightArgument
Possible operators include == (equals), != (not equals), < (lesser), > (greater), <= (lesser or equal), >= (greater or equal), AND (logical AND) and OR (logical OR)
The one operand operator: NOT Argument

Code to implement the scanning and parsing aspect of this query language can be found in the files ql_scan.C and ql_parse.C and the corresponding .h files ql_scan.h and ql_parse.h. The code for implementing these operations as queries can be found in ql_evaluate.C.

Completing the Provided Code

You will be implementing the DELETE, SELECT, PROJECT and JOIN operations. The relevant class structures for these operations include QLDeleteStatementNode (DELETE), QLSelectQueryItem (SELECT), QLProjectQueryItem (PROJECT), and QLJoinQueryItem (JOIN). You will also need to look at the class QLQueryItemStatementNode, and the classes used to define Conditions and the attribute list for PROJECT (QLAttributeListNode).

In order to execute these operations you will need to further check the parsed operations (not all of the checking has been done). For the DELETE, SELECT and JOIN operations you need to check the provided condition. Things to check:

The final result of the condition should be an integer.
Any RelationVariable.AttributeName combination should correspond to a relation variable and attribute name produced by the arguments to the operation (e.g. SELECT (T.ID == G.A) FROM ([Teams T]) would have two problems if ID is not an attribute of the relation Teams (the other problem is that the relation variable G is not defined).
The comparison operators (e.g., ==, !=, etc.) must be applied to arguments of the same type (INT and INT, etc.). The one exception you make should be to allow comparisons of INT values and FLOATs.
The logical operators (AND, OR, NOT) must be applied only to integer arguments.

You can define the operation to check this process recursively (to check a two operand operation call your routine recursively on each argument to get its type, then compare the types that are returned). If a condition violates the type requirements you may simply terminate the evaluation process.

Similarly, you will need to create a routine that calculates whether a condition is true based on a tuple provided to that routine. This routine can also be written recursively similarly to how you write the type checking routine for conditions.

In order to construct the query processing routines you may want to add to the existing classes or define your own class structure. Queries involving SELECT, PROJECT and JOIN can be answered by constructing a structure similar to the Scan structure for each type of query item in a query tree. For example, a query item of the form [Relation Variable] can be made to correspond to a scan that produces tuples from Relation one at a time (using the getNext feature). A SELECT is then hooked to a scan that produces tuples and the SELECT's getNext routine runs until one of the tuples provided to the SELECT meets the Condition. The PROJECT and JOIN operations are similarly defined. Note that the JOIN may be implemented using a simple nested loops approach.

Error Protocol and Debugging

Be sure to follow the error protocol described in new_error.h. Note that you will likely want to add new error codes and new error messages to the tables provided for you.

The make file compiles the code using the -g flag. This means that you can debug the executables produced using gdb. I have also set up the code with a command line debugging system. When running either of the test programs you can add command line arguments of db, bm, hf, or gory. These turn on debugging flags in the Disk Space module (db), Buffer Manager module (bm), HeapFile (hf) and some extended (gory details) flags (gory). Note that you may want to add debugging commands in your Buffer Manager code following this protocol.

What to Turn In, and When

Print out your versions of all of the files you changed. You should test your code using the test routine ql_tester and print out the results. Try to completely test your code. In your test directory I have included a sample query file ql_batter. This set of commands is a starting point. You should add more test files and comprehensively test your code and turn in these results.

Once you have completed testing, write up a team report of how your code is implemented. This report should give an overview of how you completed the evaluation code and any changes you made to the classes provided and any new classes you added. It should also discuss the algorithms you used to solve the problem. This report should be at least two pages long but no longer than four pages. Each team member should also write up an individual report (at least half a page but no more than a page) discussing their contributions to the coding process and how the overall team interaction went.

Computer Science 4611 Database Management Systems Programming Assignment 5 A Simple Relational Algebra Query Language (85 points) Due Friday, May 4 2001 by 3:30 pm NO LATE PROGRAMS ALLOWED!

Introduction

A Simple Query Language

Completing the Provided Code

Error Protocol and Debugging

What to Turn In, and When

Computer Science 4611
Database Management Systems

Programming Assignment 5
A Simple Relational Algebra Query Language (85 points)
Due Friday, May 4 2001 by 3:30 pm NO LATE PROGRAMS ALLOWED!