... guide1
http://sourceforge.net/projects/cppcc
... opinion2
By the time I'm writing this, I have already played with it a bit (the examples subdir of the distribution contains the results of that) and found it quite pleasant to use.
... input3
See Annex C for the command line syntax.
... parser4
The full lexical and syntactic specifications are provided in Annex B.
...5
Note that this option only affects non-LL(1) choices, which are automatically detected by CppCC. In all the others situations, a single token is inspected, no matter how large the default lookahead was specified. Also, if a local lookahead hint is specified at non-LL(1) choice points, the value of this option is disregarded and a warning is issued.
...6
The reason for this being so is that, although they are not important during the parsing process itself (and therefore they are never returned to the parser), a user may want for instance to remember the comments that appeared inside an input source (comments are the most common case where SKIP tokens are used), and then it can a user action that will store the comment token images somewhere for later use.
... code7
The idea behind this is that keywords have as their image a fixed string representation which is specified by a language's grammar, and thus the image found in the input file would be redundant (an IF keyword can only have as its image the string "if", aso).
... name8
Therefore, it is correct to use the notions of a token's id and its name interchangeably, as they are biunivocally equivalent.
... fields9
The only ``restriction'' is that the block must have balanced parentheses. CppCC relies on this to find out where the block ends.
... parser10
Note that, if a token action was also defined for that token, it is executed before the commonTokenAction method is called. Also, this method is only called once after the token has been matched, even if the token is returned more than once to the parser (this situation may occur during the parser's lookahead).
... returns11
Because the generated parser is recursive-descendant, the called method will most probably call other methods in its turn as part of the parsing process.
... directly12
Actually, i think string matching will be faster using a regular expression than anything one could write by hand.
... executed13
If not obvious, the reason is that when a production is expanded during lookahead evaluation, there is no ``real parsing'' being performed, just a ``peek'' at what follows on the input stream. If the lookahead would fail, those user action should have never been executed.
... lookahead14
See the sample grammar for the CppCC's input file in the CppCC distribution ofr an example on how to use it. Additional restrictions apply to such user actions: for instance, they cannot use the normal environment guaranteed for normal functioning of the parser like the current token, aso) and, because the lookahead code can return at any point when the decision whether it succeeded or not was taken they are not even guaranteed to be executed.
... point15
CppCC will detect such points and signal them to the user. Also, it will detect useless lookahead hints, point them out and ignore them. The only exception here is the semantic lookahead, which is never ignore at a choice point (it is intended to act more as a supplementary user-defined condition in the parsing algorithm).
... is16
See Annex B for the exact syntax.
... taken17
Note that the syntax is a bit misleading, in that the lookahead is associated with an expansion, but it actually affects the decision at the level within which that expansion is included as an operand. For instance (LOOKAHEAD(3) THIS | OTHER) means ``when having to chose between THIS and OTHER, use a lookahead of up to 3 tokens'', although the LOOKAHEAD itself is associated with the THIS expansion.
... <TWODOTS>)18
The parentheses surrounding the LABEL <TWODOTS> expansion are mandatory. Their purpose is to tell CppCC that the lookahead applies to them as whole, otherwise it would only apply to the expansion following it, i.e. LABEL, which is not what we want.
... lookahead19
This is a real-life example taken from the Java Language Specifications.
... productions20
This example is adapted from the VHDL93 grammar.
...{ public: myMethod(...) { ..... }}21
CppCC is smart enough to skip strings and comments inside a user code, so they can contain any number of curly braces.
...22
Of course, this does not mean that other standard exception such as std::bad_cast, etc are guaranteed not to be thrown.
...bool onIOError (ScanException \&s)23
This will change in future versions, as soon as the GNU's gcc C++ library will implement the ANSI standard I/O error handling through exceptions.
... namespace24
E.g. when several cppcc generated parsers are part of the same project, using different namespace names is needed in order to avoid confilcting definitions.
... preserved25
Note that the length of the token is a proper field and therefore changes to the image will not be reflected in it; its value will remain set to the initial value filled in by the scanner.
... refilled26
These are the most common situation a scanner needs to deal with, i.e. file inclusion, in which case a new input stream must be open at a certain point and once it is completely read, the scanner must be switched back to the old input stream at the point where it was left. The second case occurs when multiple files are to be processed sequentially.
... character27
Note that due to internal buffering that is done by the scanner, this is almost never similar to getInputStream().get(). This would happen only when the internal buffer would be empty.
... tokens28
The scanner itself does not treat characters like '\n' or '\r' in any special way, it is the user's responsibility to handle line terminators. However, the column counter is always incremented as characters are read.
... scanner29
This means that a user action will only be executed once, just after the token has been recognized in the input stream, and never later, even if the token is returned more than once as the result of a la(k) call.
...unmodified30
Of course, if the lexical section is modified, the profiling data becomes inaccurate, and cppcc will refuse to use it.