- ... guide1
- http://sourceforge.net/projects/cppcc
- ... opinion2
- By the time I'm writing this, I have already played with it a bit
(the examples subdir of the distribution contains the results of that)
and found it quite pleasant to use.
- ... input3
- See Annex C for the command line syntax.
- ... parser4
- The full lexical and syntactic specifications are provided in Annex B.
- ...5
- Note that this option only affects non-LL(1) choices, which
are automatically detected by CppCC. In all the others situations,
a single token is inspected, no matter how large the default lookahead
was specified. Also, if a local lookahead hint is specified at non-LL(1)
choice points, the value of this option is disregarded and a warning
is issued.
- ...6
- The reason for this being so is that, although they are not important
during the parsing process itself (and therefore they are never returned
to the parser), a user may want for instance to remember the comments
that appeared inside an input source (comments are the most common
case where SKIP tokens are used), and then it can a user action that
will store the comment token images somewhere for later use.
- ... code7
- The idea behind this is that keywords have as their image a fixed
string representation which is specified by a language's grammar,
and thus the image found in the input file would be redundant (an
IF keyword can only have as its image the string "if",
aso).
- ... name8
- Therefore, it is correct to use the notions of a token's id and its
name interchangeably, as they are biunivocally equivalent.
- ... fields9
- The only ``restriction'' is that the block must have balanced
parentheses. CppCC relies on this to find out where the block ends.
- ... parser10
- Note that, if a token action was also defined for that token, it is
executed before the commonTokenAction method is called.
Also, this method is only called once after the token has been matched,
even if the token is returned more than once to the parser (this situation
may occur during the parser's lookahead).
- ... returns11
- Because the generated parser is recursive-descendant, the called method
will most probably call other methods in its turn as part of the parsing
process.
- ... directly12
- Actually, i think string matching will be faster using a regular expression
than anything one could write by hand.
- ... executed13
- If not obvious, the reason is that when a production is expanded during
lookahead evaluation, there is no ``real parsing'' being performed,
just a ``peek'' at what follows on the input stream. If the lookahead
would fail, those user action should have never been executed.
- ... lookahead14
- See the sample grammar for the CppCC's input file in the CppCC distribution
ofr an example on how to use it. Additional restrictions apply to
such user actions: for instance, they cannot use the normal environment
guaranteed for normal functioning of the parser like the current token,
aso) and, because the lookahead code can return at any point when
the decision whether it succeeded or not was taken they are not even
guaranteed to be executed.
- ... point15
- CppCC will detect such points and signal them to the user. Also, it
will detect useless lookahead hints, point them out and ignore them.
The only exception here is the semantic lookahead, which is never
ignore at a choice point (it is intended to act more as a supplementary
user-defined condition in the parsing algorithm).
- ... is16
- See Annex B for the exact syntax.
- ... taken17
- Note that the syntax is a bit misleading, in that the lookahead is
associated with an expansion, but it actually affects the decision
at the level within which that expansion is included as an operand.
For instance (LOOKAHEAD(3) THIS | OTHER) means ``when
having to chose between THIS and OTHER, use a lookahead
of up to 3 tokens'', although the LOOKAHEAD itself is associated
with the THIS expansion.
- ... <TWODOTS>)18
- The parentheses surrounding the LABEL <TWODOTS> expansion
are mandatory. Their purpose is to tell CppCC that the lookahead applies
to them as whole, otherwise it would only apply to the expansion following
it, i.e. LABEL, which is not what we want.
- ... lookahead19
- This is a real-life example taken from the Java Language Specifications.
- ... productions20
- This example is adapted from the VHDL93 grammar.
- ...{ public: myMethod(...) { ..... }}21
- CppCC is smart enough to skip strings and comments inside a user code,
so they can contain any number of curly braces.
- ...22
- Of course, this does not mean that other standard exception such as
std::bad_cast, etc are guaranteed not to be thrown.
- ...bool onIOError (ScanException \&s)23
- This will change in future versions, as soon as the GNU's gcc C++
library will implement the ANSI standard I/O error handling through
exceptions.
- ... namespace24
- E.g. when several cppcc generated parsers are part of the same project,
using different namespace names is needed in order to avoid confilcting
definitions.
- ... preserved25
- Note that the length of the token is a proper field and therefore
changes to the image will not be reflected in it; its value will remain
set to the initial value filled in by the scanner.
- ... refilled26
- These are the most common situation a scanner needs to deal with,
i.e. file inclusion, in which case a new input stream must be open
at a certain point and once it is completely read, the scanner must
be switched back to the old input stream at the point where it was
left. The second case occurs when multiple files are to be processed
sequentially.
- ... character27
- Note that due to internal buffering that is done by the scanner, this
is almost never similar to getInputStream().get(). This would happen
only when the internal buffer would be empty.
- ... tokens28
- The scanner itself does not treat characters like '\n'
or '\r' in any special way, it is the user's
responsibility to handle line terminators. However, the column counter
is always incremented as characters are read.
- ... scanner29
- This means that a user action will only be executed once, just after
the token has been recognized in the input stream, and never later,
even if the token is returned more than once as the result of a la(k)
call.
- ...unmodified30
- Of course, if the lexical section is modified, the profiling data
becomes inaccurate, and cppcc will refuse to use it.