|
File ChangingX.html Author McKeeman Copyright © 2007 index Changing the X CFGChanging XThe first action in changing the X language is changing the X CFG. The initial X CFG describes an implementable language. A new construct is added to X by adding one or more rules to the CFG. The new rule(s) must also be implementable, which means that corresponding changes to the rest of the compiler must be possible. Knowing how to design implementable rules is an art; your skill will improve with practice. You should, of course, use a new name for your new CFG. If the Marx Brothers did a language project, they might have called their grammar CHGGZ.cfg. My name, X, is meant to imply that anything can be substituted for it. The simplest way to start is to pick a rule that "is like" the others, then carry through the implementation, as in the warm up exercise. After awhile, you can be more adventurous because you will better understand the implications of your choices. The Original X CFGThe easiest way to get familiar with the X CFG is to run >> tryCfg();One of the trials reads and dumps the result of CFG analysis for X.cfg. Perhaps even better, do the test steps by hand one at a time:
>> grtxt = xread('X.cfg')
You will see the raw text of the current X.cfg.
Then try
>> grobj = Cfg(grtxt)You will see the Cfg object (which is itself not very interesting). The object output does, however, hint at more information. The sizes of V, Vi, V, and the reserved word count are explicit. The difference in size between Vi and reserved is explained by the three "special" input symbols id, integer and real, which are not reserved names, but rather "reserved classes" handled later by the lexer. If you want all the data in one go, just type >> grobj.dump()otherwise just look at one public object field at a time, as in >> grobj.Vn Everyday CfgsThe format of an everyday cfg is given by four simple rules.
stmt selection iteration assignment Manufactured Rule NamesThe rule names are of particular interest. Try >> grobj.ruleNamesWhat you will see is manufactured rule names, one for each rule of the CFG. The Cfg object makes the names by catenating all the characters in the rule, inserting an underbar ('_') between the phrase name and each definition, and using character identifiers for each non letter. The result is a name that is visually identifiable with a particular grammar rule, and will not change unless that grammar rule itself changes. This is a plus when you are changing rules during development. The trickery, a kind of name mangling, is tucked away in object idCtor.m. Adding Rule(s) to X.cfgThe typical change to X.cfg adds a few rules. You might want to look at the warm up exercise for an example. The format of X.cfg is straightforward. It is an everyday grammar. A new reserved word or operator is added by including it in a right-hand side of some rule. The Cfg object classifies it as an input symbol because it does not appear in the left margin (and thereby becoming a phrase name). You must, of course, keep your VI and VN separate. After every change, it is a good idea to make a Cfg object and look at the results to confirm that the changes are what you desire. Adding Good Rule(s) to X.cfgConforming to the layout of X.cfg and satisfying Cfg.m is straightforward. Making a usable cfg takes more skill. Read the advice about language design. Start small. In the long run, as your ambitions increase, you can consider more radical departures from current practice Adding a Data StructureThe most obvious data structure is a vector. Following the C tradition one might imagine writing x := y[i]; y[i+1] := 13.1; y := [1.1,2.0,-rand];Any of these three lines should cause y to be entered into the symbol table as a real vector. Adding a Phrase Name for a Class of Symbols (e.g. string)Suppose you want to implement strings, and need 'xyz' be a token in the same way that 123 is a token. You use the symbol string in the CFG in one or more rules. If you do nothing else, string will be a reserved word since it does not appear on the left. There is a place in Cfg.m where the reserved word table is built. It has exceptions for id, integer and real. Add string to the exception list. Note: it is common in grammar input languages to use a special form to indicate reserved classes. For example, <ID> or <STRING>. If you like that convention, you can implement it. In any case, the lexer will have to be changed to classify and collect the information for the new class of symbols. See the treatment of identifiers for an analogous case. Making the CFG tablesThe CFG tables only need to be made when you change X.cfg. If you are just experimenting and do not want to clobber the tables being used by xcom, run >> makeCfg X.cfgand be prepared to wait a minute for the LR tables. If you get an error-free run and you do want to use the tables, run >> makeCfg -saveMat X.cfgThe newly computed tables will be put into a file cfg.mat. xcom uses cfg.mat. xcom will warn if cfg.mat is out of date with respect to X.cfg. You do not have to use the LR tables if you do not want to (see the discussion on bottom up vs. top down parsers), but it a good idea to make a grammar that is LR(1). If you are determined to press ahead without satisfying the LR(1) constraints, run >> makeCfg -noLR -saveMat X.cfg Changing the Recursive ParserSee details here.LR errorsIt can be tedious to get a cfg to conform to the LR(1) constraints. At some point I will add better LR(1) failure diagnostics. |