Computer Science 5641
Compiler Design
Project Part 1 (15 points)
Due September 29, 2009

Over the course of the semester we will be implementing a working interpreter for a simple language. The project will be divided into several smaller parts, each building on previous parts. In this first section we will begin with something (hopefully) straightforward and give you a chance to get comfortable with programming.

Scanner Support

One of the simple but important tasks in implementing a compiler or interpreter is managing the identifiers provided by the users. This is generally done through a hash table containing identifiers where space is often managed by storing the strings in a string space. This allows the lookup and comparison of strings to occur very quickly. For this work you will implement a StringSpace class and a StringTable class as described below:

StringSpace - a string space consists of a series of pages of data each consisting of a series of unique strings. The string space is used by the compiler whenever a user supplies an identifier (and we will also use it for reserved words). If the string is new the compiler will add it to the top page of data and from that point on use a pointer to the page and the offset on that page as the unique reference to that string (so that string comparison operators can mostly be avoided). To this end you should implement this data structure.

A StringSpace object is one that maintains (internally) a set of data pages on which it stores strings (as many as will fit on each page). For a real system, the size of the page would be determined by system parameters (page sizes in memory and/or on disk). For our work, the page size should be a parameter used to create the initial StringSpace object (you can set the page size small when testing your programs and increase the size later when using this module). A StringSpace provides one function for other users, an insert_string function that takes a pointer to a string, stores the string on some page, and returns a reference to where the string is stored in the form of a related data structure called a StringSpaceEntry. A StringSpaceEntry maintains (internally) a pointer to the page of data containing the string and an offset on that page where the string can be found. A StringSpaceEntry should include a function to return the actual string corresponding to that entry, as well as functions to compare a new string to that entry (and possibly) another StringSpaceEntry.

Data in the string space is maintained in pages. Each page has an array of characters it can use to store 1 or more strings. When a request to insert a string comes in you should add the string to the top page if there is enough space or add a new page as the top page and then add the string to that page.

StringTable - a string table is a hash table used to store strings encountered by the compiler. For our work we will also use it to recognize keywords (to make scanning a bit easier). When a string is encountered during scanning the compiler will lookup that string in the StringTable. If the string is not already in the table it will be added to the StringTable. A string is added to the string table by adding the string to the string space and then using the StringSpaceEntry as the data for the hash table. Your hash entry should also include a token number. You should plan on implementing the hash table using a linked list to deal with collisions. The number of buckets in the hash table should be a parameter used in creating the StringTable so that you can set it small for testing and make it larger later. The token number corresponding to a hash table entry for the moment can be a random value. During scanning we will start by inserting all of the reserved words into the StringTable and their corresponding token numbers. Then, when we find an identifier in the program we will determine if that identifier corresponds to a reserved word by looking it up in the string table.

Implementation Details - you should implement your code in C++, but make it as general as possible, as you may use something similar in Java eventually. I would recommend you make use of an IDE, one good example is the Eclipse IDE which you can download at http://www.eclipse.org . If you wish to use this on a windows machine you can download CygWin also at http://www.cygwin.com . Make sure to go through the installation part where you select the parts and selection gcc, g++, make and gdb. In addition, after setting it up you will need to set your path to include the cygwin bin directory (to allow access to these routines).

Testing - make sure to carefully test your code (as your later code will depend on this). You should plan to implement test programs to test both your string space and string table implementations and submit results from this testing to demonstrate that your program is working.

Writeup - document your code and your testing. For this part of the project you will be working separately. Later we will introduce teams as part of the project.