File ChangingParser.html    Author McKeeman    Copyright © 2007    index

Changing the xcom Parser

Preliminaries

In MATLAB, in the xcom directory, type (or cut from here and paste to the MATLAB command line)

>> help Parser.m
>> xcom -parseTrace -parseDump x:=1
or, to run the Parser standalone
>> load cfg cfg               % prepare to run Parser standalone
>> EOL = 10;                  % ASCII newline
>> lx = Lexer(cfg, ['x:=1;' EOL 'y:=x']);
>> pr = Parser(lx)
The public fields and methods are visible, so try
>> pr.dump()

Read the general description of parsing.

LR Parsing

The LR(1) parser is implemented in 20 or so lines in method LR1 of Parser.m. There is no reason to change these lines. Ever. The LRerror method, on the other hand, can always use improvement.

Recursive Parsing

The recursive recognition methods in Parser.m reflect the IOG for X. If you change the CFG, then the change needs to be propagated to the IOG. The recursive methods need to reflect the new/changed CFG rules.

Each recursive method calls one or more of accept, reject, step and reduce. Each call of step moves ahead in the token stream, changing variable tok, and reporting the shifted token. A call to accept first checks the current token is what is expected, calls shift if it is, and calls reject if it is not. A call to reject issues a diagnostic and aborts the compilation.

Immediately after a successful test for a token, for example

    if tok == rw.PLUS
the step method is always called to discard the current token and get the next one. Method accept calls step on its own.

Each call of reduce reports a reduction. The calls to reduce occur exactly at the place the output symbols indicate in the IOG. The combined effect is to build the shift/reduce sequence.

The only consumer of the shift/reduce sequence is the tree builder, and it needs no other information.

Hard Cases

One can write a CFG that is difficult to parse recursively, even if it passes the LR(1) restrictions. On the other hand, the recursive author can cheat as necessary, basing decisions on any information that is available. The bottom-up parser needs correct tables or it fails.

Recursive parsers and LR parsers have been written for most popular languages, such as C, C++ and Java.