Computer Science 5641
Compiler Design
Project Part 2 - The Scanner (60 points)
Due Tuesday, October 19, 2004

Introduction

In this part of the project you will construct two separate scanners to recognize tokens described below. One of your scanners you will construct using flex. The second scanner should be an implementation of the transducer you constructed in the first part of this project. Try to make the scanners as similar as possible. You should also plan on constructing several test files and testing both sets of code on each test file.

The Token Set

Punctuation:    ;    ,    (    )    {    }

Operators:    =    +    -    *    /    ==    !=    <    >    <=    >=    !    &&    ||   <<   >>   .

Reserved words: char int float if else fi struct while elihw (case matters)

Identifier: a letter followed by 0 or more letters, digits, or underscore (_) characters

Character constant: a single typed character between single quotes (') -- but the newline character can not appear between the single quotes, nor can a tab, the single quote character itself or the backslash character (\). These character constants are represented (respectively) by '\n' (newline), '\t' (tab), '\'' (single quote) and '\\' (backslash).

Integer constant: one or more digit characters

Float constant: one or more digit characters, followed by a period (.), followed by one or more digit characters

String constant: a double quote character followed by 0 or more string components ending with a double quote character. A component of a string can include any character except a newline character, a tab character, a backslash character or a double quote character. These characters are represented (respectively) by \n (newline), \t (tab), \\ (backslash) and \" (double quote).

Notes on Scanning

For the flex version of the scanner if the user fails to complete a string or character constant you should provide an error message after recognizing the character starting the string or character constant, then start scanning for tokens at the next character. For your transducer you may ignore all of the characters up to the next character than ends a string.

Recognizing Reserved Words

To recognize reserve words you should make use of your string table. In your code you should declare a global string table and insert into that string table all of the reserved words with their corresponding token numbers. Then, when you encounter an identifier you should look that identifier up in the global string table, if it is in the table return the token number associated with the identifier, otherwise add the identifier to the string table with an identifier token number. This should be part of the action associated with an identifier in your flex definition.

What To Turn In

Turn in documented versions of all of your code (including test code). Also document your test cases and show results from both methods on your test files. You should also write a team report on this part of the project and in addition submit a short individual report from each member of the team.