Subsections


4 Overview of the generated code

From a grammar specification, CppCC will generate a set of C++ sources that will be written in a certain target directory which can be given as an argument on the command line. Considering the (complete) grammar for the language Foo presented in Example 3.3.1.0.3 as input file, CppCC will generate the following files (their base names are identical to the class names specified into the grammar file):

foo_token.hh, foo_token.cc
header and definition of the token's and helper classes used by the Foo parser and scanner;
foo_scanner.hh, foo_scanner.cc
header and definition of the Foo scanner and its helper classes;
foo_parser.hh, foo_parser.cc
header and definition of the Foo parser and its helper classes.
The contents of each of these files as well as what each of the classes contains will be discussed in the remainder of this section.

The cc and hh extensions can be changed to whatever the compiling platform recognizes as the C++ source and header extension, by using the SRC_EXTENSION and HDR_EXTENSION options.

All the class names in the sources generated with CppCC will be placed in a separate namespace whose default name is cppcc. Therefore, before using the parser or scanner, the user code must contain at least the following two lines;

#include "foo_parser.hh" 
using namespace cppcc; // or use cppcc::FooParser, if it's the only thing used here
The NAMESPACE_NAME option can be used to specify an alternate name for the namespace24.


4.1 The token class' sources

These sources contain two class definitions, the token's class and a helper class called Position. Both these classes are declared into the cppcc namespace. If present, the block of code that precedes the TOKEN section into the grammar file will be inserted at the beginning of the token class' header file, just before the line that declares the cppcc namespace. The header will therefore look like this:

... preamble user code, if any ... 
namespace cppcc 

  class Position { ... }; 
  class FooToken { ... }; 
}


4.1.1 The token class

The token class is the mean of communication between the scanner and the parser. Each time it is invoked, the scanning routine returns a new Token object containing the next token to be used by the parser. The token object is an instance of the token class. The token's class features are given below:

cppcc::FooToken
the token class used within the generated scanner and parser


4.1.1.0.1 Example

Suppose in our project we have an already-implemented-and-used-all-over-the-place string class, called MyString. We don't want to have to convert token images from std::string into a MyString; the best way of avoiding this is to have the tokens store their images directly as MyString objects. Therefore, we add to the options section:

STRING_CLASS = ``MyString''Then the resulting token class will have its image method declared like this:

MyString& image ();For this customization to work, the custom string class must meet certain (rather straightforward) conditions:

Also, any user code found into the token's section of the grammar file will be merged inside the token's class body (the user is free to add more custom constructors if needed).

Additionally, for each token defined in the syntax section and for the special <eof> token, a symbolic constant id is generated as a public static field of the token's class.


4.1.2 The Position class

Besides the token's class, the token's header contains the declaration for the Position class. This class is used to encapsulate a position inside a file. Its main features are given below:

cppcc::Position
a textual position inside a file


4.2 The scanner class' sources

These sources contain the definitions of the scanner's class and of the ScanException class used in lexical error handling (see Sec. 3.5.3). If present, the block of code that precedes the lexical section into the grammar file will also be included at the beginning of the header file, which will roughly look like this:

... preamble user code, if any ... 
namespace cppcc 

  class ScanException { ... }; 
  class FooScanner { ... }; 
}


4.2.1 The scanner class

The main purpose of the scanner class is to encapsulate a DFA that recognizes tokens on the input stream. Its interface must mainly provide methods for retrieving tokens in the order that they appeared into the input stream. The tokens that are recognized by the scanner class CppCC generates are those that were defined into the lexical section of the input grammar. Besides retrieving tokens one at a time, the scanner class must facilitate lookahead examination of tokens, because the parser that uses the token class must be able to inspect not only the current token, but also a certain number of tokens that follow it.

The implementation of the scanner class uses an queue of matched tokens in order to allow lookahead inspection. New tokens are inserted into the queue upon request, i.e. wen the parser requests tokens beyond the point reached by the scanner. In this case, the scanner will invoke an internal method that will scan the input stream as many times as necessary to reach the token that was requested. Tokens are kept in the queue and can be inspected several times by using the FooScanner::la(...) method described below. To request a token to be removed from the queue, the FooScanner::consume() method must be called. This method will remove the oldest token (eventually by reading it just then form the input stream, if the queue was empty at the moment of the call). Figure 20 illustrates the functioning of the lookahead queue.

Figure 20: Functioning of the scanner's token queue.
\includegraphics{pics/la_queue.eps}

In order to speed up character reading, the scanner class uses its own input buffer. Several methods are provided to the user that allow a buffer with its current state and associated input stream to be saved in a single opaque object that can be later restored or to simply switch over to another input stream next time the buffer needs to be refilled26.

The main features of the scanner's class are described below:

cppcc::FooScanner
transforms an input character stream into a stream of tokens according to the grammar

Besides the methods and fields mentioned above, any user code (fields and methods) found into code of the lexical section will be merged into the scanner class.

The token actions will be inserted into the internal scanning method at the appropriate points so that they will be executed each time a new token is accepted by the scanner29.


4.2.2 The ScanException class

This class is used by the scanner for signaling lexical errors. When such an error occurs while in the scanning method, a new object of this type is created and the location within the input file and the error's reason are stored inside it before it is passed away to the parser or to the user's lexical error handler. The main features of the ScanException class are:

cppcc::ScanException
contains the description of a lexical error

*
public methods:

char* what ();
reimplemented from the std::exception superclass. Returns a string containing a description of the lexical error that occurred.
operator string();
returns a string containing the position in the input stream associated with the exception followed by the description of the error.


4.3 The parser class' sources

These sources contain the parser's class and the ParseException class which is used by the lexical error handling (see Sec. 17). If present, the block of code that precedes the syntax section into the grammar file will also be included at the beginning of the parser's header file, which roughly looks like this:

... preamble user code, if any ... 
namespace cppcc 

  class ParseException { ... }; 
  class FooParser { ... }; 
}


4.3.1 The parser class

The parser class encapsulates the automaton that accepts inputs conforming to the grammar specified into the syntax section. As each syntactic expansion is matched against the input token stream, the associated user actions are executed, if any. Each instance of a parser will use a scanner object to filter its input stream and transform characters into tokens. The main features of the parser class are:

cppcc::FooParser
contains a parser for the grammar specified into the syntax section


4.3.2 The ParseException class

This class is used by the parser for signaling syntax errors. When such an error occurs into one of the parser's methods, a new object of this type is created and the location within the input file and the error's reason are stored inside it before it is passed to the user's syntax error handler and/or thrown down the call stack. The main features of the ParseException class are:

cppcc::ParseException
contains the description of a syntax error

*
public methods:

char* what ();
reimplemented from the std::exception superclass. Returns a string containing a description of the syntax error that occurred.
operator string ();
returns a string containing the position associated with the parse exception followed by the exception's description.


Alec Panovici 2003-02-01