File Cfg.html    Author McKeeman    Copyright © 2007    index

Implementation of Cfg Data Structures

Cfg.m is an object. It holds the information about the input grammar. The object is passed through the various phases of xcom where the information is used as needed.

Experiments with Cfg

xcom actually has two parsers, one top-down and one bottom-up. The preparation of the bottom-up tables take a little bit of time. It can be convenient to skip making these tables if they are not immediately needed. Try

>> c = Cfg(xread('X.cfg'), '-noLR');
>> c
to see the Cfg tables (excluding the LR(1) entries). The contents of any particular table can then be examined, for example
>> c.Vn
gives you the phrases names.

There are dumper and control flags which can be seen by the call

>> help Cfg
There are a number of simple grammars to try out. Type
>> dir *.cfg
For the smaller grammars there is no reason to avoid the LR table building.

Making Cfg Tables for xcom

When you are ready with a new file X.cfg, you can make tables for xcom.
>> makeCfg -saveMat X.cfg

Everyday Grammar into MATLAB Arrays

The everyday format for grammars is chosen to make it easy to read the grammar and put it into useful data structures. The immediate task is to capture R, VI and VN. Unique symbols need unique integer representatives, so the strings are collected in cell arrays of strings; the cell index is the representative.

It is helpful to renumber the symbols so that all VI are less than all VN.

Rule Names

When using yacc, the interpretation of the rules is stuck right into the input grammar (VO). In xcom, the connection is made by giving a unique name to the rule to be used later when the rule needs to be interpreted. The names need to be such that later changes to the CFG do not require systematic changes to all the rule names. Consider the everyday CFG for proposition. The rules are simply numbered. Inserting a rule would make all the subsequent numbers off by one. The convention used in xcom is to make a long name which is essentially the rule itself with blanks supressed and special characters replaced by mnemonic letter sequences. For example, X rule:

disjunction
  disjunction | conjunction
turns into rule name
 'disjunction_disjunctionORconjunction'
The cell array of rule names is turned into an enum for efficient access.

Operator Names

As the CFG is processed, every symbol is either a phrase name or reserved. Among symbols, some are separators (like '(') and some are operators (like '+'). The xcom convention is to consider separators as single-character tokens, and the rest as operator identifiers. The choice is hand-coded in Cfg.m and must be changed it new separators are desired.

Names for Classes of Symbols

There are four technically reserved words in X.cfg that are not really reserved. The symbol 'id' stands for all identifiers. The symbol 'integer' stands for all integer literals. The symbol 'real' stands for all real literals. The symbol 'eof' stands for an unprintable symbol (usually 0x0). Each of them is made invisible to the table lookup for identifiers (by prepending a blank so that then cannot be found). The reason is that a user might want to write

  id = 3;
and wonder why 'id' was not allowed as an identifier.

Head Symbols

Sometimes the parser needs to make a decision based on the next input symbol where any of a large set of possibilities is acceptable. In particular, when deciding whether the next input symbol could start an expression, anything that can start an expression leads to "yes". Grammatically speaking, these symbols are heads of expr. The CFG object provides a function

   head(phraseName, symbol)
to fill this need.

The computation of this information requires the closure over a relation. The relation starts with immediate head: s is an immediate head of N if s is the first symbol in a rule defining N. If that first symbol is an erasing symbol, then the second symbol is also an immediate head, and so on. Given the relation of immediate heads, the transitive closure computation gives all heads.

The Tables

The available information in the Cfg tables that will be used by xcom can be seen by

>> load cfg cfg
>> cfg
>> cfg.V  % for instance