Interpreter

Intent

Given a language, define a represention for its grammar along with an interpreter that uses the representation to interpret sentences in the language.

Participants

AbstractExpression - declares an abstract interpret operation.
TerminalExpression - implements the interpret operation for terminal (leaf) expressions in the language.
NonterminalExpression - implement the interpret operation for nonterminal expressions in the language.
Context - contains information that is global to the interpreter.
Client - builds the abstact syntax tree (parse tree) and invokes the interpret operation.

Notes

There are two important problems with regard to interpreters that are not addressed in GHJV: defining the language and defining the grammar for the language. The grammar is assumed as a given in GHJV.

Defining the Language

In a given design context, some aspects of the language are fixed. For example, if you are designing an interpreter for numerical expressions you want to stay close to what what people expect expressions to look like. But suppose you want to interpret expressions that contain variables and you want to allow variables to have complex names. It it legitimate to impose restrictions on the names to simplify the interpretation process.

For example, most programming languages require that an identifier (a terminal expression for naming things) consists only of alphanumeric characters and that the first character must be alphabetic. Excluding non-alphanumeric characters avoids ambiguities arising from allowing symbols with other meanings, such as the operator symbols, to appear in identifiers. Requiring that the first character be alphabetic simplifies distinguishing identifiers from numeric literals.

Defining the Grammar

A language can have many grammars. A carefully chosen grammar can make parsing the language much simpler. The best choice of a grammar also depends on the type of tools that you know how to work with. One grammar may be more suitable for use with automatic parser generators. Another may be more suitable if you have to write the parsing code without an automatic parser generator.

Abstract Syntax Trees

According to GHJV, the Interpreter pattern does not explain how to create an abstract syntax tree (also known as a parse tree), only how to perform operations with it. The parse tree itself is just an example of the Composite design pattern. As the authors point out in one of the implementation notes, a Vistor design pattern can be used to perform the operations.

The Big Picture

Normally, the creation of a parse tree must be considered as part of the design of an interpreter. When this is done the best design usually involves at least two major components.

A scanner (or lexical analyser)
A scanner groups characters into tokens, which are low-level units of meaning (TerminalExpressions) in the source text.
A parser
A parser builds an internal representation of the text structure. This representation is called a parse tree (A Composite). The parser adds NonterminalExpressions to the TerminalExpressions found by the scanner.

If the interpreted language has variables, types, functions, or procedures then an interpreter needs an additional type of major component.

Symbol tables
A symbol table is used to record information about identifiers. An identifier is a type of token that is used for naming variables, types, functions, and procedures. The symbol table is an important part of the Context participant in the Interpreter design pattern.

Finally, if there are multiple interpret operations that are applicable to a given parse tree then a Visitor design pattern should be used. This introduces another kind of major component.

Visitors
Each Visitor implements a single interpret operation.