File Topdown.html    Author McKeeman    Copyright © 2007    index

Top-down Parsing

The top-down method relies on the IOG as a programming template to specify some mutually recursive functions. Each function is responsible for one phrase name, so the total number of such functions can be predicted by examining the CFG. For X, there are 55.

The whole process of writing the recursive processor is presented in a simplified version based on the Proposition Cfg in the class handout.

The first steps toward writing the parser functions use grammar transformations. An experienced parser author sometimes does not bother with explicit transformations, doing them mentally at the keyboard while preparing the parser.

Label the Rules

The first step is to label the CFG rules in a systematic way. In effect, this changes the context-free grammar into an input-output grammar. The labels are the members of VO. One can merely number the rules, as is done for Proposition in the section on context-free grammars, or use moderately mnemonic labels incorporating the phrase name as done in the labelling example for X, or use wholly mnemonic labels as is done in xcom where the second rule for vars

vars
  vars , id

is labelled

  vars_varsCOMMAid

The issue is maintainability. Making a change in xcom should not cause the programmer to have to change unrelated names (as would be the case in first two example labels above). The labels in xcom are in fact generated in a predictable way by the Cfg object based on the whole text of each rule. To see them, in MATLAB, type

  >> c=Cfg(xread('X.cfg'));
  >> c.ruleNames

The input-output grammar is closely analogous the the yacc input language. The equivalent of the xcom rule labels in the IOG are yacc actions expressed as C text enclosed in curly braces.

Construct the IOG

The second step is to transform the labelled CFG into the regular expression form of an IOG for X, removing left recursion and dragging the rule labels along. Changing the CFG to the free-form notation to distinguish phrase names, input symbols, and output symbols is part of the task.

Write Recursive Parsing Functions

The third step is to implement the recursive functions in analogy to those for Proposition. For X, the functions are methods in mxcom/m/Parser.m. The current token must be updated immediately after passing any token comparison test. The output actions have to be turned into reduce calls. The entire effect of parsing is to produce a shift/reduce sequence.

Test Early and Often

The fourth step is to implement a unit test for the parser. The file mxcom/m/tests/testParser.m is such a test, but it is nothing to be proud of since it depends on a "gold file" of prerecorded results. My feelings will not be hurt if you do a much better job.

Making a Small Change in the Parser

See the details here.

Why not Something Simpler?

There have been many unique solutions to the parsing problem over the years, from the earlier Fortran compilers onward. What has been shown above is one of the best.

One can imagine not using either the REG or IOG and still be able to write recursive parsers. You can. The transformations leading to the recursive parsing template are, generally speaking, not that difficult to do in your head while you are typing.

What you cannot do is skip the transformations. Take, for example, the rules

Disjuction 
  Disjunction | Conjunction
  Conjunction

One could write

function Disjunction()
  if (???)
    Disjunction();
    accept('|');
    Conjunction();
    reduce(1)
  else
    Conjunction();
    reduce(2);
  end
end

The first problem is what to put in place of the ???. The meaning of the ??? has to be "if there is a '|' coming and nothing is going to get in the way (like a ')').

The second problem is that Disjunction calls itself immediately, which will lead to a stack overflow and no answer.

Cfg left-recursion has to be turned into an iteration, in your head or in your IOG.

On the other hand, if the rules are tried in the opposite order,

Disjuction 
  Conjunction
  Disjunction | Conjunction

you could try backtracking from any point of failure and failing over to the second alternative. This solution works but is not efficient.

And, of course, the bottom-up methods also will work and are competitive. The table builder is usually a bit tedious to work with but the eventual parser is "as good as it gets".


Footnote:

I learned top-down compiling by reading the Burroughs B5000 Algol-60 compiler in 1963. It was self-compiling. Kudos to Lloyd Turner and Dave Dahm. I learned bottom-up compiling while implementing part of Euler for Niklaus Wirth and Helmut Weber.