File Parsing.html    Author McKeeman    Copyright © 2007    index

Parsing

Phrase Structure

The parser accepts a token stream, discards whitespace and comments, and builds a shift/reduce sequence according to the phrase structure described by the context-free grammar. The shift/reduce sequence is a record of tokens used and CFG rules applied.

The shift/reduce sequence is stored as an array in the parser object. A shift is represented by a positive integer (the token index) and a reduce is represented by a negative integer (the rule number). Access to it is supplied by parser methods.

There are two competing parsing algorithms called respectively, bottom-up and top-down. Each has their advantages and disadvantages. In xcom the two kinds of parsers are both implemented and either can be used without changing anything else in xcom. The buzz-word is plug-compatible. The performance is about the same.

>> xcom x:=1              % run xcom using top-down parser
>> xcom -bottomUp x:=1    % run xcom using bottom-up tables

In the real world, the bottom-up techniques require supporting software (usually a version of yacc). The bottom-up technology is well understood and leads to parsers about which one can prove some properties.

The top-down techniques can be implemented in any recursive programming language. Like any program, a top-down parser can have bugs. There are also top-down parser generators.

Experts argue endlessly about which technique is better in what circumstances.

The following two links provide descriptions of the two techniques.