File SyntaxTree.html    Author McKeeman    Copyright © 2007    index

Syntax Tree, Abstract Syntax Tree

Syntax Tree

First review context-free grammar theory.

The syntax tree for an X program is completely determined by the X Cfg. The syntax tree is systematically generated by Tree.m from the shift/reduce sequence made by the parser. Each shift causes a token to be stacked. Each reduce represents the application of a Cfg rule. Suppose the Cfg rule has w elements on the r.h.s. The reduce action pops w elements off the stack; puts them into the tree as a node, then pushes the node onto the stack. The elements that were popped are the descendents of the newly-stacked node. Here is a worked example.

The tree consists of internal nodes and peripheral leaves. All of the significant (non-white) tokens from the input are the leaves of the tree. The xcom tree is not designed to be transformed (read-only use).

Tree Walking

The syntax tree is accessed by tree walking methods of the Tree object. Suppose T is a Tree object and n is a node.

Tree method call method description
T = Tree(parseobj, flags) making a Tree object
n = T.getRoot() the unique node for X CFG phrase program
r = T.getRule(n) The name of node n
p = T.getPat(n) The Node/Leaf pattern of node n
w = T.getWidth(n) The number of kids of node n
n = T.getKid(n, i) The ith subnode of node n
t = T.getLeaf(n) The token associated with the (leaf) node n

The whole tree is accessible, in any order, from the root. Method

        n2 = getKid(n1,i)

moves leafward from node n1 to ith subnode n2. Returning from getKid moves rootward.

Nodes and Leaves

Method getRule(n) returns the name of the Cfg phrase for node n. Which subnodes are internal and which are leaves is obvious from the Cfg. If the phrase name for some node n is a leaf (e.g. id), it is an error to call getKid(n,i) on it. Instead, getLeaf(n) returns the token.

The method getPat(n) is redundant in that knowing the Cfg allows the correct choice to be made between getKid(n) and getLeaf(n). On the other hand, it is sometimes convenient to walk a tree without reference to the grammar (for example, a general tree dumper). The Tree fields

T.NONE
T.NODE
T.LEAF
T.OPAQ

can be used to decode the getPat(n) value. The value NONE terminates a pattern. The value OPAQ never appears in xcom; it is intended for extensions when the corresponding node index points to neither an internal node nor to a leaf -- a so-called tree decoration.

Abstract Syntax Tree (AST)

It is in the nature of programming language syntax trees that much of the data is not needed. In particular, the long chains of reduce actions

expr<-disjunction<-conjunction<-negation<-relation<-sum<-term<-factor

required to expression operator precedence all show up in the tree and must be walked over to get to the significant stuff. Also, most of the leaves (such as '+' or parens) are never examined. An abstract syntax tree is one with these nodes snipped out. Such a tree takes less space and time but it takes an additional set of specifications and must be maintained each time the Cfg changes. The worked example shows the results.

In xcom one can choose the full syntax tree with flag
>> xcom -noAST other args.

Writable Trees

It is common, in production compilers, to find writable abstract syntax trees. Once semantic information is gathered about some node, the information is written into the node. In xcom this is never done, but rather the information is kept in the locals of the tree walker or in an auxiliary data structure (for instance, the symbol table). The node type OPAQ mentioned above could be used to flag a writable node.

Implementation

The implementation, in particular the storage layout, is described in the comments of the code.