Language translation generally involves at least two major components.

If the interpreted language has variables, types, functions, or procedures then an interpreter needs an additional type of major component.

A scanner (or lexical analyser) groups characters into tokens, which are low-level units of meaning in the source code. Tokens are similar to words and punctuation marks in natural languages.

A scanner typically provides an iterator interface for use by the parser. The interface has an implicit current position in the source code text, which is either at a token or beyond the end of the text. It provides operations for

Scanners will often omit the first operation. Instead, they return a special end token for the token at the current position when the current position is beyond the end of the text. For some languages the scanner may also need to be able to retrieve tokens beyond the current position without advancing. This is called look ahead.

A symbol table is used to record information about identifiers, one of the types of tokens in the source code. Identifiers are use for naming variables, types, and subprograms so the symbol tables must be capable of recording different kinds of information. The symbol tables must be structured to reflect the different kinds of name scopes in the language.

Many high-level languages have some form of block structuring which determines a variety of nested scopes for the identifiers. Since the meaning of an identifier can depend on the scope, symbol tables usually organized as a stack of simple tables. The simple tables are pushed onto the stack when parsing enters a new scope and popped when parsing leaves a scope. When information about an identifier is needed, the simple tables are scanned from top to bottom for entries about the identifier. The first entry encountered is the one that used. In short, a Chain of Responsibility design pattern is used, with the chaining following the nesting.

There are scopes associated with each type in an object-oriented program in a typed, class-based OOL. In the following Java code example, there is a scope for the System class, in which the member out is declared. Since out is declared to have type PrintStream, there is another scope for the PrintStream type. The println() method is defined in this last scope.

      System.out.println("Hello, world!");
      

A compiler uses a symbol table for each of these type scopes. There is a type table that contains all of these symbol tables as entries. The entries are keyed by the type name.

Modern parsers usually build an internal structural representation of the source code. This representation is called a parse tree. To understand parse trees, we need to look first at some grammar definitions. We will look at definitions for expressions and statements in the C programming language. These definitions are only an approximation to the truth.

In the following syntax definitions the color red indicates a token type and the color blue indicates a nonterminal type.

In the following syntax definitions the color red indicates a token type and the color blue indicates a nonterminal type. These definitions are far from complete.

A parse tree is a structural representation of a particular source text as interpreted in a specific grammar. It has node types for each type of grammar component in the language. Each subtree corresponds to a structural component of the source code. There are two general types of nodes.

  • Leaves -
    The leaves of a parse tree are tokens from the source text. They are Leaves in the Composite desgn pattern and TerminalExpressions in the Iterpreter design pattern.
  • Interior Nodes -
    The interior nodes of a parse tree are roots of subtrees corresponding to higher-level structures in the source text. The type of an interior node indicates the grammar rule that governs the children of that node. Interior nodes are Composites in the Composite design pattern and NonterminalExpressions in the Interpreter design pattern.

The diagram to the left shows a simple parse tree for the simple expression x*y. Leaves (tokens) are shown in red and interior nodes are shown in blue.

Construction of parse tree nodes in the numeric expressions grammar are shown along with diagrams showing typical partial parse trees.

  public Expression(Scanner sc) {
    Term term = new Term(sc);
    addChild(term);
    Token op = sc.getToken();
    String type = op.getType();
    while (type.equals("+") || type.equals("-")) {
      addChild(op);
      sc.advance();
      term = new Term(sc);
      addChild(term);
      op = sc.getToken();
      type = op.getType();
    }
  }
      

  public Term(Scanner sc) {
    Factor factor = new Factor(sc);
    addChild(factor);
    Token op = sc.getToken();
    String type = op.getType();
    while (type.equals("*") || type.equals("/") || type.equals("%")) {
      addChild(op);
      sc.advance();
      factor = new Factor(sc);
      addChild(factor);
      op = sc.getToken();
      type = op.getType();
    }
  }
      

Constructing Factor Nodes

  public Factor(Scanner sc) {
    Token firstToken = sc.getToken();
    String firstType = firstToken.getType();
    if (firstType.equals("identifier")) {
      addChild(new IdentifierExpression(sc));
    } else if (firstType.equals("numeric-literal")) {
      addChild(firstToken);
      sc.advance();
    } else if (firstType.equals("(")) {
      addChild(firstToken);
      sc.advance();
      addChild(new Expression(sc));
      Token lastToken = sc.getToken();
      String lastType = lastToken.getType();
      if (lastType.equals(")")) {
        addChild(lastToken);
        sc.advance();
      } else {
        // handle error
      }
    } else {
      // handle error
    }
  }
            

  public IdentifierExpression(Scanner sc) {
    Token identifier = sc.getToken();
    addChild(identifier);
    sc.advance();
    Token nextToken = sc.getToken();
    String nextType = nextToken.getType();
    if (nextType.equals("(")) {
      addChild(new ParameterList(sc));
    }
  }
      

  public Parameters(Scanner sc) {
    Token lparens = sc.getToken();
    addChild(lparens);
    sc.advance();
    addChild(new ParameterList(sc));
    Token rparens = sc.getToken();
    String nextType = rparens.getType();
    if (nextType.equals(")")) {
      addChild(rparens);
      sc.advance();
    } else {
      // handle error
    }
  }
      

  public ParameterList(Scanner sc) {
    Token nextToken = sc.getToken();
    String nextType = nextToken.getType();
    if (nextType.equals(")")) {
      return;
    }
    addChild(new Expression(sc));
    nextToken = sc.getToken();
    nextType = nextToken.getType();
    while (nextType.equals(",")) {
      addChild(nextToken);
      sc.advance();
      addChild(new Expression(sc));
      nextToken = sc.getToken();
      nextType = nextToken.getType();
    }
  }