4.1 The token class' sources
- 4.1.1 The token class
- 4.1.2 The Position class
4.2 The scanner class' sources
- 4.2.1 The scanner class
- 4.2.2 The ScanException class
4.3 The parser class' sources
- 4.3.1 The parser class
- 4.3.2 The ParseException class

4 Overview of the generated code

From a grammar specification, CppCC will generate a set of C++ sources that will be written in a certain target directory which can be given as an argument on the command line. Considering the (complete) grammar for the language Foo presented in Example 3.3.1.0.3 as input file, CppCC will generate the following files (their base names are identical to the class names specified into the grammar file):

foo_token.hh, foo_token.cc: header and definition of the token's and helper classes used by the Foo parser and scanner;
foo_scanner.hh, foo_scanner.cc: header and definition of the Foo scanner and its helper classes;
foo_parser.hh, foo_parser.cc: header and definition of the Foo parser and its helper classes.

The contents of each of these files as well as what each of the classes contains will be discussed in the remainder of this section.

The cc and hh extensions can be changed to whatever the compiling platform recognizes as the C++ source and header extension, by using the SRC_EXTENSION and HDR_EXTENSION options.

All the class names in the sources generated with CppCC will be placed in a separate namespace whose default name is cppcc. Therefore, before using the parser or scanner, the user code must contain at least the following two lines;

#include "foo_parser.hh" using namespace cppcc; // or use cppcc::FooParser, if it's the only thing used hereThe NAMESPACE_NAME option can be used to specify an alternate name for the namespace²⁴.

4.1 The token class' sources

These sources contain two class definitions, the token's class and a helper class called Position. Both these classes are declared into the cppcc namespace. If present, the block of code that precedes the TOKEN section into the grammar file will be inserted at the beginning of the token class' header file, just before the line that declares the cppcc namespace. The header will therefore look like this:

... preamble user code, if any ... namespace cppcc { class Position { ... }; class FooToken { ... }; }

4.1.1 The token class

The token class is the mean of communication between the scanner and the parser. Each time it is invoked, the scanning routine returns a new Token object containing the next token to be used by the parser. The token object is an instance of the token class. The token's class features are given below:

cppcc::FooToken

the token class used within the generated scanner and parser

public fields:

Position bPos, ePos;

the starting and ending position of the token inside the input file;

int id;

indicates the kind of this token; each token that is declared inside the lexical section will be assigned a unique integer to indicate its kind.
public constructors:

FooToken ();

default constructor: image is initialized to the void string, bPos and ePos are initialized using the Position's default constructor (see below) and the id is set to 0. These values are the default values to which the fields will be set by other constructors if not otherwise mentioned.

FooToken (int id_, const std::string &image_,

const Position &bPos_, const Position &ePos_);

creates a new token and sets the fields to the given values.

FooToken (int id_);

creates a new token with the given id;

FooToken (const std::string &image);

creates a new token with the given image;

FooToken (const Position &bPos, const Position &ePos = Position());

creates a new token with the position fields set to the given values; if no end position is given, a default value is provided.
public methods:

std::string& image ();

returns the image of this token. The obtained reference points to an internal field of the token object so any changes made to the image will be preserved²⁵. The actual return type of this method (and of the internal image field) can be controlled by the user, by means of the STRING_CLASS option.

4.1.1.0.1 Example

Suppose in our project we have an already-implemented-and-used-all-over-the-place string class, called MyString. We don't want to have to convert token images from std::string into a MyString; the best way of avoiding this is to have the tokens store their images directly as MyString objects. Therefore, we add to the options section:

STRING_CLASS = ``MyString''Then the resulting token class will have its image method declared like this:

MyString& image ();For this customization to work, the custom string class must meet certain (rather straightforward) conditions:

it must be default constructible
it must provide an assign public method with the same semantics as for std::string, i.e. must be declared as:
MyString& MyString::assign(const char* s, size_type n)and its effect must be the assignment of the first n characters starting at s to *this.

Also, any user code found into the token's section of the grammar file will be merged inside the token's class body (the user is free to add more custom constructors if needed).

Additionally, for each token defined in the syntax section and for the special <eof> token, a symbolic constant id is generated as a public static field of the token's class.

4.1.2 The Position class

Besides the token's class, the token's header contains the declaration for the Position class. This class is used to encapsulate a position inside a file. Its main features are given below:

cppcc::Position

a textual position inside a file

public fields:

int ln;

line inside the file.

int col;

the column inside the file. This field is only generated if the COUNT_COLUMNS option is set to true.
public constructors:

Position ();

default constructor: ln and col fields are initialized to 0.

Position (int ln_, int col_);

creates a new position object indicating the given line and column;

Position (const Position& o);

copy constructor.

4.2 The scanner class' sources

These sources contain the definitions of the scanner's class and of the ScanException class used in lexical error handling (see Sec. 3.5.3). If present, the block of code that precedes the lexical section into the grammar file will also be included at the beginning of the header file, which will roughly look like this:

... preamble user code, if any ... namespace cppcc { class ScanException { ... }; class FooScanner { ... }; }

4.2.1 The scanner class

The main purpose of the scanner class is to encapsulate a DFA that recognizes tokens on the input stream. Its interface must mainly provide methods for retrieving tokens in the order that they appeared into the input stream. The tokens that are recognized by the scanner class CppCC generates are those that were defined into the lexical section of the input grammar. Besides retrieving tokens one at a time, the scanner class must facilitate lookahead examination of tokens, because the parser that uses the token class must be able to inspect not only the current token, but also a certain number of tokens that follow it.

The implementation of the scanner class uses an queue of matched tokens in order to allow lookahead inspection. New tokens are inserted into the queue upon request, i.e. wen the parser requests tokens beyond the point reached by the scanner. In this case, the scanner will invoke an internal method that will scan the input stream as many times as necessary to reach the token that was requested. Tokens are kept in the queue and can be inspected several times by using the FooScanner::la(...) method described below. To request a token to be removed from the queue, the FooScanner::consume() method must be called. This method will remove the oldest token (eventually by reading it just then form the input stream, if the queue was empty at the moment of the call). Figure 20 illustrates the functioning of the lookahead queue.

**Figure 20:** Functioning of the scanner's token queue.
$\includegraphics{pics/la_queue.eps}$

In order to speed up character reading, the scanner class uses its own input buffer. Several methods are provided to the user that allow a buffer with its current state and associated input stream to be saved in a single opaque object that can be later restored or to simply switch over to another input stream next time the buffer needs to be refilled²⁶.

The main features of the scanner's class are described below:

cppcc::FooScanner

transforms an input character stream into a stream of tokens according to the grammar

public fields:
public constructors:

FooScanner ();

creates a new scanner object with no associated input stream. In this case, an input stream must be set up before the scanner is first used.

FooScanner (istream *in_ = NULL);

creates a new scanner object that will read characters from the given stream; the state of the input stream is not checked until the first read attempt. This means that NULL can be given as argument. If so, the first call to the scanning method will also result in a call to the wrap() method prior to any token matching. This allows all the stream-opening related code to be grouped in a single place (i.e. in the wrap method).
public methods:
- input stream related methods:
  - input stream switching: these methods allow the user's code to query, set, save and restore the input state of the scanner
    
    istream& getInputStream ();
    
    returns the stream that is currently used by the scanner;
    
    void switchToStream (istream *in);
    
    causes the scanner to start reading characters from the given stream next time its input buffer underruns. The current input state will not be saved when the switching occurs. This method is intended o be called from the wrap() handler as the default action when an input stream was completely read and the user wishes to move to the next stream. Note that switching to a new stream does not involve the scanner taking any actions such as closing or deleting the old istream object. It is the user's responsibility to open, close and delete the stream objects.
    
    FooScanner::StreamState* pushStream (istream *in);
    
    causes the scanner to save the current input state (this includes the internal buffer and the pointer into it, the positions where last token was matched, the associated input stream, etc) into a newly allocated StreamState object and then switch over to the given input stream as soon as it will need to match the next token.
    
    void popStream (FooScanner::StreamState *s);
    
    causes the scanner to switch to a previously saved input state that is contained in the given StreamState object. As a pair of pushStream, this method also takes care of deleting the StreamState object that was allocated by it. The next token read will be from the restored input stream.
  - low lever character-oriented interface: these methods provide character-oriented access from the user's code to the internal input buffer of the scanner.
    
    int getChar () throw(ScanException);
    
    returns the current character that would have been be read by the scanner and advances its pointer to the next character²⁷.
    
    void unGetChars (const char *s, int n);
    
    puts n characters starting from s into the input buffer, cause them to be the next characters read by the scanner. Note that characters will be read in the same order as they are in s, not in reversed order.
    
    void unGetChar (char c);
    
    puts c into the input buffer causing it to be the next character read by the scanner.
    
    void unGetChars (const string &s);
    
    puts all the characters in s into the input buffer causing them to be the next characters read by the scanner. Note that characters will be read in the same order as they are in s, not in reversed order.
    
    void unGetChars (char *s);
    
    puts all the characters in s into the input buffer causing them to be the next characters read by the scanner. Unlike the version that also receives a counter argument, here s must be a null-terminated string. Note that characters will be read in the same order as they are in s, not in reversed order.
- token retrieval methods:
  
  FooToken* la (int k = 0) throw(ScanException);
  
  returns the (k+1)'th lookahead token. If k == 0, this is exactly the current token in the input stream (the LL(1) token).
  This method may cause the scanner to read tokens from the input stream and store them into the queue (if k == 2 for instance and there was only one token into the queue, two more tokens are read and stored into the queue which will now contain three tokens. The third will be returned as the result of the method).
  The token object is guaranteed not to be altered until a subsequent call to consume or la. The user code must never attempt to delete a token object, as this will cause unexpected behaviour from the parser.
  
  void consume () throw(ScanException);
  
  removes the currently LA(1) (i.e. the token that would be returned by a call to la(0)) from the tokens' queue. If the tokens queue is empty at the moment of the call, a call to la(0) is issued prior to removing the token.
- lookahead support methods (see Sec. 3.4.3):
  
  void setMarker ();
  
  saves the current position into the input token stream so that it can be restored later. Successive calls are stacked.
  
  void rewindToMarker ();
  
  rewinds the input token stream to the most recently saved position, and pops that position from the markers stack. After the rewind operation, the next tokens read are exactly the ones that were on the input stream previous to the matching setMarker() call.
  
  void pushBack (const Token &t);
  
  pushes the token object t at the front of the token queue, causing it to be the new LA(1) token (t is not copied, and should therefore not be deleted after the call to pushBack).
  
  bool lookingAhead ();
  
  returns true if the scanner is working in lookahead mode (i.e. at least one marker was set).
- input stream position handling:
  
  const Position& getCurrentPos ();
  
  returns the current position into the input stream. The current position that is reported is that at which the LA(1) token begins (this is actually the point that the parser has reached, the fact that the scanner is reading tokens ahead should be transparent to the external classes).
  
  void resetPos();
  
  resets the current position within the input stream (can be called from within the wrap() user method, if a new input stream is opened). This method is automatically called when a new input stream is set up (by calling pushStream or switchToStream). The beginning position is line 1, column 1.
  
  void newLine();
  
  increments the line counter of the scanner and resets the column counter to 1; A call to this method should appear as part of the token action associated to the line terminator tokens²⁸.
- lexical state handling (see Sec. 3.3):
  
  void switchToState (int s);
  
  sets the lexical state of the scanner to s; has no effect if s is an invalid state. Note that the call only takes effect when a new token will be read from the input stream (any tokens that are already into the token buffer will be returned to the parser as they are). For this reason. state switches should only appear as part of token actions, which are always executed just after a token was matched on the input stream, and it is certain that the next token will be matched according to the new state.
  
  void getState ();
  
  returns the current lexical state of the scanner.
  
  int pushState (int newState);
  
  sets the lexical state of the scanner to newState, but saves the current one onto an internal stack. The old lexical state is also returned as the result value of the call.
  
  void popState ();
  
  switches back to the topmost lexical state saved onto the stack. Honest, i cant remember what happens if the stack is empty, but i'm sure noone that really cares about that at four o'clock in the morning anyway.

Besides the methods and fields mentioned above, any user code (fields and methods) found into code of the lexical section will be merged into the scanner class.

The token actions will be inserted into the internal scanning method at the appropriate points so that they will be executed each time a new token is accepted by the scanner²⁹.

4.2.2 The ScanException class

This class is used by the scanner for signaling lexical errors. When such an error occurs while in the scanning method, a new object of this type is created and the location within the input file and the error's reason are stored inside it before it is passed away to the parser or to the user's lexical error handler. The main features of the ScanException class are:

cppcc::ScanException

contains the description of a lexical error

public fields:

Position pos;

the position relative to the beginning of the input stream where the error occurred.
public constructors:

ScanException (const string &reason = "Scan exception");

creates a new ScanException object with no associated position and the given description.

ScanException (const Position &pos_, const std::string &reason_ = "");

creates a new object with pos initialized to the given Position and that will return as the given string as the result of its what() method;

*

public methods:

char* what ();: reimplemented from the std::exception superclass. Returns a string containing a description of the lexical error that occurred.
operator string();: returns a string containing the position in the input stream associated with the exception followed by the description of the error.

4.3 The parser class' sources

These sources contain the parser's class and the ParseException class which is used by the lexical error handling (see Sec. 17). If present, the block of code that precedes the syntax section into the grammar file will also be included at the beginning of the parser's header file, which roughly looks like this:

... preamble user code, if any ... namespace cppcc { class ParseException { ... }; class FooParser { ... }; }

4.3.1 The parser class

The parser class encapsulates the automaton that accepts inputs conforming to the grammar specified into the syntax section. As each syntactic expansion is matched against the input token stream, the associated user actions are executed, if any. Each instance of a parser will use a scanner object to filter its input stream and transform characters into tokens. The main features of the parser class are:

cppcc::FooParser

contains a parser for the grammar specified into the syntax section

public fields:

FooScanner scanner;

the scanner object used to retrieve tokens from the input stream.

FooToken *token;

points to the token object that was the most recently retrieved from the scanner (aka the current token).
public constructors:

FooParser (istream *in = NULL);

creates a new parser object that will read from the given stream. The pointer to the initial input stream is passed on to the scanner's constructor.
public methods:
- parsing methods: for each production found into the syntax section of the grammar file, a public method will be generated. The formal arguments list and return type are exactly those specified into the productions' declarations. The exceptions specified into the throw clause of each production will also appear into the throw clause of the corresponding method. Additionally, if the USE_EXCEPTIONS option is set to true, the ios_base::failure, ScanException and ParseException exceptions will also be appended to the exceptions list of each method. Each of these methods can be called from the user's application code in order to start the parsing process; the called method will not return until either the parsing is completed or aborted due to an error.

4.3.2 The ParseException class

This class is used by the parser for signaling syntax errors. When such an error occurs into one of the parser's methods, a new object of this type is created and the location within the input file and the error's reason are stored inside it before it is passed to the user's syntax error handler and/or thrown down the call stack. The main features of the ParseException class are:

cppcc::ParseException

contains the description of a syntax error

public fields:

Position pos;

the position relative to the beginning of the input stream where the error occurred. The position is the one reported by the scanner's getCurrentPos() method.
public constructors:

ParseException (const string &reason_ = "Parse exception");

creates a new ParseException object with no associated position and the given description.

ParseException (const Position &pos_, const string &reason_ = "Parse exception");

creates a new object with pos initialized to the given Position and that will return as the given string as the result of its what() method;

*

public methods:

char* what ();: reimplemented from the std::exception superclass. Returns a string containing a description of the syntax error that occurred.
operator string ();: returns a string containing the position associated with the parse exception followed by the exception's description.

Alec Panovici 2003-02-01