Subsections
4 Overview of the generated code
From a grammar specification, CppCC will generate a set of C++ sources
that will be written in a certain target directory which can be given
as an argument on the command line. Considering the (complete) grammar
for the language Foo presented in Example 3.3.1.0.3 as input
file, CppCC will generate the following files (their base names are
identical to the class names specified into the grammar file):
- foo_token.hh, foo_token.cc
- header and definition of the token's
and helper classes used by the Foo parser and scanner;
- foo_scanner.hh, foo_scanner.cc
- header and definition of the Foo
scanner and its helper classes;
- foo_parser.hh, foo_parser.cc
- header and definition of the Foo
parser and its helper classes.
The contents of each of these files as well as what each of the classes
contains will be discussed in the remainder of this section.
The cc and hh extensions can be changed to whatever
the compiling platform recognizes as the C++ source and header extension,
by using the SRC_EXTENSION and HDR_EXTENSION options.
All the class names in the sources generated with CppCC will be placed
in a separate namespace whose default name is cppcc. Therefore,
before using the parser or scanner, the user code must contain at
least the following two lines;
#include "foo_parser.hh"
using namespace cppcc; // or use cppcc::FooParser, if it's the only thing used hereThe NAMESPACE_NAME option can be used to specify an alternate
name for the namespace24.
4.1 The token class' sources
These sources contain two class definitions, the token's class and
a helper class called Position. Both these classes are declared into
the cppcc namespace. If present, the block of code that precedes the
TOKEN section into the grammar file will be inserted at the beginning
of the token class' header file, just before the line that declares
the cppcc namespace. The header will therefore look like this:
... preamble user code, if any ...
namespace cppcc
{
class Position { ... };
class FooToken { ... };
}
4.1.1 The token class
The token class is the mean of communication between the scanner and
the parser. Each time it is invoked, the scanning routine returns
a new Token object containing the next token to be used by the parser.
The token object is an instance of the token class. The token's class
features are given below:
- cppcc::FooToken
- the token class used within the generated scanner
and parser
- public fields:
- Position bPos, ePos;
- the starting and ending position
of the token inside the input file;
- int id;
- indicates the kind of this token; each token that
is declared inside the lexical section will be assigned a unique integer
to indicate its kind.
- public constructors:
- FooToken ();
- default constructor: image is initialized
to the void string, bPos and ePos are initialized
using the Position's default constructor (see below) and the id
is set to 0. These values are the default values to which the fields
will be set by other constructors if not otherwise mentioned.
- FooToken (int id_, const std::string &image_,
-
- const Position &bPos_, const Position &ePos_);
- creates
a new token and sets the fields to the given values.
- FooToken (int id_);
- creates a new token with the given
id;
- FooToken (const std::string &image);
- creates a new token
with the given image;
- FooToken (const Position &bPos, const Position &ePos = Position());
- creates
a new token with the position fields set to the given values; if no
end position is given, a default value is provided.
- public methods:
- std::string& image ();
- returns the image of this token.
The obtained reference points to an internal field of the token object
so any changes made to the image will be preserved25. The actual return type of this method (and of the internal image
field) can be controlled by the user, by means of the STRING_CLASS
option.
4.1.1.0.1 Example
Suppose in our project we have an already-implemented-and-used-all-over-the-place
string class, called MyString. We don't want to have to convert
token images from std::string into a MyString; the
best way of avoiding this is to have the tokens store their images
directly as MyString objects. Therefore, we add to the options
section:
STRING_CLASS = ``MyString''Then the resulting token class will have its image method
declared like this:
MyString& image ();For this customization to work, the custom string class must meet
certain (rather straightforward) conditions:
Also, any user code found into the token's section of the grammar
file will be merged inside the token's class body (the user is free
to add more custom constructors if needed).
Additionally, for each token defined in the syntax section and for
the special <eof> token, a symbolic constant id is generated
as a public static field of the token's class.
4.1.2 The Position class
Besides the token's class, the token's header contains the declaration
for the Position class. This class is used to encapsulate
a position inside a file. Its main features are given below:
- cppcc::Position
- a textual position inside a file
- public fields:
- int ln;
- line inside the file.
- int col;
- the column inside the file. This field is only
generated if the COUNT_COLUMNS option is set to true.
- public constructors:
- Position ();
- default constructor: ln and col
fields are initialized to 0.
- Position (int ln_, int col_);
- creates a new position
object indicating the given line and column;
- Position (const Position& o);
- copy constructor.
4.2 The scanner class' sources
These sources contain the definitions of the scanner's class and of
the ScanException class used in lexical error handling (see Sec. 3.5.3).
If present, the block of code that precedes the lexical section into
the grammar file will also be included at the beginning of the header
file, which will roughly look like this:
... preamble user code, if any ...
namespace cppcc
{
class ScanException { ... };
class FooScanner { ... };
}
4.2.1 The scanner class
The main purpose of the scanner class is to encapsulate a DFA that
recognizes tokens on the input stream. Its interface must mainly provide
methods for retrieving tokens in the order that they appeared into
the input stream. The tokens that are recognized by the scanner class
CppCC generates are those that were defined into the lexical section
of the input grammar. Besides retrieving tokens one at a time, the
scanner class must facilitate lookahead examination of tokens, because
the parser that uses the token class must be able to inspect not only
the current token, but also a certain number of tokens that follow
it.
The implementation of the scanner class uses an queue of matched tokens
in order to allow lookahead inspection. New tokens are inserted into
the queue upon request, i.e. wen the parser requests tokens beyond
the point reached by the scanner. In this case, the scanner will invoke
an internal method that will scan the input stream as many times as
necessary to reach the token that was requested. Tokens are kept in
the queue and can be inspected several times by using the FooScanner::la(...)
method described below. To request a token to be removed from the
queue, the FooScanner::consume() method must be called. This
method will remove the oldest token (eventually by reading it just
then form the input stream, if the queue was empty at the moment of
the call). Figure 20 illustrates the functioning
of the lookahead queue.
Figure 20:
Functioning of the scanner's token queue.
|
In order to speed up character reading, the scanner class uses its
own input buffer. Several methods are provided to the user that allow
a buffer with its current state and associated input stream to be
saved in a single opaque object that can be later restored or to simply
switch over to another input stream next time the buffer needs to
be refilled26.
The main features of the scanner's class are described below:
- cppcc::FooScanner
- transforms an input character stream into
a stream of tokens according to the grammar
- public fields:
- public constructors:
- FooScanner ();
- creates a new scanner object with no associated
input stream. In this case, an input stream must be set up before
the scanner is first used.
- FooScanner (istream *in_ = NULL);
- creates a new scanner
object that will read characters from the given stream; the state
of the input stream is not checked until the first read attempt. This
means that NULL can be given as argument. If so, the first
call to the scanning method will also result in a call to the wrap()
method prior to any token matching. This allows all the stream-opening
related code to be grouped in a single place (i.e. in the wrap method).
- public methods:
- input stream related methods:
- input stream switching: these methods allow the user's code to query,
set, save and restore the input state of the scanner
- istream& getInputStream ();
- returns the stream that is
currently used by the scanner;
- void switchToStream (istream *in);
- causes the scanner
to start reading characters from the given stream next time its input
buffer underruns. The current input state will not be saved when the
switching occurs. This method is intended o be called from the wrap()
handler as the default action when an input stream was completely
read and the user wishes to move to the next stream. Note that switching
to a new stream does not involve the scanner taking any actions such
as closing or deleting the old istream object. It is the user's responsibility
to open, close and delete the stream objects.
- FooScanner::StreamState* pushStream (istream *in);
- causes
the scanner to save the current input state (this includes the internal
buffer and the pointer into it, the positions where last token was
matched, the associated input stream, etc) into a newly allocated
StreamState object and then switch over to the given input
stream as soon as it will need to match the next token.
- void popStream (FooScanner::StreamState *s);
- causes
the scanner to switch to a previously saved input state that is contained
in the given StreamState object. As a pair of pushStream,
this method also takes care of deleting the StreamState object
that was allocated by it. The next token read will be from the restored
input stream.
- low lever character-oriented interface: these methods provide character-oriented
access from the user's code to the internal input buffer of the scanner.
- int getChar () throw(ScanException);
- returns the current
character that would have been be read by the scanner and advances
its pointer to the next character27.
- void unGetChars (const char *s, int n);
- puts n characters
starting from s into the input buffer, cause them to be the next characters
read by the scanner. Note that characters will be read in the
same order as they are in s, not in reversed order.
- void unGetChar (char c);
- puts c into the input buffer
causing it to be the next character read by the scanner.
- void unGetChars (const string &s);
- puts all the characters
in s into the input buffer causing them to be the next characters
read by the scanner. Note that characters will be read in the
same order as they are in s, not in reversed order.
- void unGetChars (char *s);
- puts all the characters
in s into the input buffer causing them to be the next characters
read by the scanner. Unlike the version that also receives a counter
argument, here s must be a null-terminated string. Note that characters
will be read in the same order as they are in s, not in reversed
order.
- token retrieval methods:
- FooToken* la (int k = 0) throw(ScanException);
- returns
the (k+1)'th lookahead token. If k == 0, this is exactly
the current token in the input stream (the LL(1) token).
This method may cause the scanner to read tokens from the input stream
and store them into the queue (if k == 2 for instance and
there was only one token into the queue, two more tokens are read
and stored into the queue which will now contain three tokens. The
third will be returned as the result of the method).
The token object is guaranteed not to be altered until a subsequent
call to consume or la. The user code must never attempt to
delete a token object, as this will cause unexpected behaviour from
the parser.
- void consume () throw(ScanException);
- removes
the currently LA(1) (i.e. the token that would be returned by a call
to la(0)) from the tokens' queue. If the tokens queue is empty at
the moment of the call, a call to la(0) is issued prior to removing
the token.
- lookahead support methods (see Sec. 3.4.3):
- void setMarker ();
- saves the current position into the
input token stream so that it can be restored later. Successive calls
are stacked.
- void rewindToMarker ();
- rewinds the input token stream
to the most recently saved position, and pops that position from the
markers stack. After the rewind operation, the next tokens read are
exactly the ones that were on the input stream previous to the matching
setMarker() call.
- void pushBack (const Token &t);
- pushes the token object
t at the front of the token queue, causing it to be the new LA(1)
token (t is not copied, and should therefore not be deleted after
the call to pushBack).
- bool lookingAhead ();
- returns true if the scanner is working
in lookahead mode (i.e. at least one marker was set).
- input stream position handling:
- const Position& getCurrentPos ();
- returns the current
position into the input stream. The current position that is reported
is that at which the LA(1) token begins (this is actually the point
that the parser has reached, the fact that the scanner is reading
tokens ahead should be transparent to the external classes).
- void resetPos();
- resets the current position within the
input stream (can be called from within the wrap() user method,
if a new input stream is opened). This method is automatically called
when a new input stream is set up (by calling pushStream
or switchToStream). The beginning position is line 1, column
1.
- void newLine();
- increments the line counter of the scanner
and resets the column counter to 1; A call to this method should appear
as part of the token action associated to the line terminator tokens28.
- lexical state handling (see Sec. 3.3):
- void switchToState (int s);
- sets the lexical state of
the scanner to s; has no effect if s is an invalid state. Note that
the call only takes effect when a new token will be read from the
input stream (any tokens that are already into the token buffer will
be returned to the parser as they are). For this reason. state switches
should only appear as part of token actions, which are always executed
just after a token was matched on the input stream, and it is certain
that the next token will be matched according to the new state.
- void getState ();
- returns the current lexical state of
the scanner.
- int pushState (int newState);
- sets the lexical state
of the scanner to newState, but saves the current one onto an internal
stack. The old lexical state is also returned as the result value
of the call.
- void popState ();
- switches back to the topmost lexical
state saved onto the stack. Honest, i cant remember what happens if
the stack is empty, but i'm sure noone that really cares about that
at four o'clock in the morning anyway.
Besides the methods and fields mentioned above, any user code (fields
and methods) found into code of the lexical section will be merged
into the scanner class.
The token actions will be inserted into the internal scanning method
at the appropriate points so that they will be executed each time
a new token is accepted by the scanner29.
4.2.2 The ScanException class
This class is used by the scanner for signaling lexical errors. When
such an error occurs while in the scanning method, a new object of
this type is created and the location within the input file and the
error's reason are stored inside it before it is passed away to the
parser or to the user's lexical error handler. The main features of
the ScanException class are:
- cppcc::ScanException
- contains the description of a lexical
error
- public fields:
- Position pos;
- the position relative to the beginning of
the input stream where the error occurred.
- public constructors:
- ScanException (const string &reason = "Scan exception");
- creates
a new ScanException object with no associated position and the given
description.
- ScanException (const Position &pos_, const std::string &reason_ = "");
- creates
a new object with pos initialized to the given Position and that will
return as the given string as the result of its what() method;
- *
- public methods:
- char* what ();
- reimplemented from the std::exception
superclass. Returns a string containing a description of the lexical
error that occurred.
- operator string();
- returns a string containing the position
in the input stream associated with the exception followed by the
description of the error.
4.3 The parser class' sources
These sources contain the parser's class and the ParseException class
which is used by the lexical error handling (see Sec. 17).
If present, the block of code that precedes the syntax section into
the grammar file will also be included at the beginning of the parser's
header file, which roughly looks like this:
... preamble user code, if any ...
namespace cppcc
{
class ParseException { ... };
class FooParser { ... };
}
4.3.1 The parser class
The parser class encapsulates the automaton that accepts inputs conforming
to the grammar specified into the syntax section. As each syntactic
expansion is matched against the input token stream, the associated
user actions are executed, if any. Each instance of a parser will
use a scanner object to filter its input stream and transform characters
into tokens. The main features of the parser class are:
- cppcc::FooParser
- contains a parser for the grammar specified
into the syntax section
- public fields:
- FooScanner scanner;
- the scanner object used to retrieve
tokens from the input stream.
- FooToken *token;
- points to the token object that was
the most recently retrieved from the scanner (aka the current token).
- public constructors:
- FooParser (istream *in = NULL);
- creates a new parser
object that will read from the given stream. The pointer to the initial
input stream is passed on to the scanner's constructor.
- public methods:
- parsing methods: for each production found into the syntax section
of the grammar file, a public method will be generated. The formal
arguments list and return type are exactly those specified into the
productions' declarations. The exceptions specified into the throw
clause of each production will also appear into the throw clause of
the corresponding method. Additionally, if the USE_EXCEPTIONS
option is set to true, the ios_base::failure, ScanException
and ParseException exceptions will also be appended to the
exceptions list of each method. Each of these methods can be called
from the user's application code in order to start the parsing process;
the called method will not return until either the parsing is completed
or aborted due to an error.
4.3.2 The ParseException class
This class is used by the parser for signaling syntax errors. When
such an error occurs into one of the parser's methods, a new object
of this type is created and the location within the input file and
the error's reason are stored inside it before it is passed to the
user's syntax error handler and/or thrown down the call stack. The
main features of the ParseException class are:
- cppcc::ParseException
- contains the description of a syntax
error
- public fields:
- Position pos;
- the position relative to the beginning of
the input stream where the error occurred. The position is the one
reported by the scanner's getCurrentPos() method.
- public constructors:
- ParseException (const string &reason_ = "Parse exception");
- creates
a new ParseException object with no associated position and the given
description.
- ParseException (const Position &pos_, const string &reason_ = "Parse exception");
- creates
a new object with pos initialized to the given Position and that will
return as the given string as the result of its what() method;
- *
- public methods:
- char* what ();
- reimplemented from the std::exception
superclass. Returns a string containing a description of the syntax
error that occurred.
- operator string ();
- returns a string containing the position
associated with the parse exception followed by the exception's description.
Alec Panovici
2003-02-01