A Compiler Written in MATLAB
- There is a compiler named xcom.
- There is a traditional compiler theory course based on xcom.
- The latest version is on the MATLAB File Exchange (# 20419).
Contents
Getting Started
The file README.html in the mxcom directory gives a structured introduction to the compiler and course.
Running xcom
Once you have the FX version unpacked into a directory you can compile and run a simple program in the X language:
xcom y:=1
y := 1
The meaning of the X program 'y:=1' is 'assign integer 1 to variable y. Since the assignment is wasted (variable y is not subsequently used), xcom reports the final value of y as output. The remaining details of the little language X are given in the X reference manual in the downloaded compiler+course. X is intended as a minimal base language to which the compiler writing student will add whatever constructs are of interest.
xcom Flags
The compiler comes with a comprehensive set of flags, mostly designed to produce intermediate output of the inner workings of xcom. For instance, the Intel X86 machine language for a program can be dumped as follows
xcom -asmTrace y:=1
Asmx86: 1 pushR EBP # save x86 frame pointer Asmx86: 2 movRR EBP,ESP # new x86 frame Asmx86: 4 movRP ESI, 8 # point at new X frame Asmx86: 7 pushA # callee save Asmx86: 8 movRC EAX,=1 (0x1) Asmx86: 13 movMR 0,EAX (y) Asmx86: 19 popA # callee restore Asmx86: 20 xorRR EAX,EAX # no run error Asmx86: 22 leave # restore previous x86 stack frame Asmx86: 23 ret # restore previous EIP x86 code dump, 23 bytes code=0x1d95e910 5589e58b750860b8010000008986000000006133c0c9c3 rc = 0, run time = 4.02286e-005 sec frame dump, 2 word(s) 0x00000001 0x00000000 y := 1
What you see is almost all prolog and epilog code (standard for any X program). At byte 8 you can see the integer constant 1 being loaded into x86 register EAX, and at byte 13 you can see the move that puts the answer at offset 0 in the frame (where variable y is located). Information about the flags can be see via the command
help xcom
FILE: xcom.m
PURPOSE: compile and go for X
USAGE: XCOM 'x:=1'
Compile and go for the X program in string 'x:=1'.
USAGE: XCOM('x/src.x')
Compile and go for the X program in file x/src.x.
USAGE: XCOM(flagsN, srcN, ... flags2, src2, flags1, src1)
Compile and go; src1 is the main program.
FLAGS:
global flags
-noExecute do not execute compiled program
-xcomTrace trace xcom (main program)
-xcomTime time xcom compiler
-frameDump dump main X frame after execution
-matlabStack dump MATLAB stack on error
-interactive interpret compiled subprograms (interactive)
-emulate interpret compiled subprograms (nonstop no output)
-exeTrace interpret compiled subprograms (nonstop w/output)
-noAST use syntax tree instead of AST
per-file flags
-srcDump dump the source for this file
-lexDump dump the lexemes
-parseDump dump the shift/reduce sequence
-treeDump dump the syntax tree
-symDump dump the symbol table
-asmDump dump the hex assembly code
-texDump dump LaTex version
-pdfDump dump pretty printed version
-parseTrace trace the parser
-treeTrace trace the syntax tree construction
-symTrace trace the symbol table construction
-genTrace trace the generator actions
-emitTrace trace the emitter actions
-asmTrace trace the assembly code construction
-asmHex trace the assembly code at byte level
-bottomUp use LR1 tables instead of recursive parser
EXAMPLES:
xcom x:=1
xcom x/smoke.x
xcom('-symDump', 'x/called.x', '-asmTrace', 'x/caller.x')
OVERVIEW:
XCOM is a compile-and-go implementation of the X language.
If there is no implementation for the underlying hardware,
the 'go' step will be INtel x86 emulation (slow).
The form of the language is given in file X.cfg.
It is patterned after Dijkstra's language in
{\em A Discipline of Programming}.
The meaning of the language is largely conventional.
There are three types (logical, integer and real).
They map into true/false, 32 bit ints and IEEE 32bit floating point.
The type of constants is manifest in their form.
The arguments to XCOM are strings.
If an argument is valid path name to an X source file (dot-x),
the file will be read; if the argument is a valid flag,
it will be used to direct the next compilation;
otherwise the argument itself is taken as X source.
If xcom has more than one source input,
the last one is taken as the main program; the rest are subprograms.
The X language is strongly typed.
The type of variables is inferred from use.
It is possible to write an X program where the type cannot be inferred;
XCOM will report an error.
The final value of variables used only on the left of assignments in
the main program will be reported as output.
The value of variables never used on the left in the main program
will require input prior to execution.
X programs may call each other. The subprogram inputs and outputs map
in order into arguments and returned values.
X programs can call C and/or M functions.
The design focus for xcom is ease of understanding and extensibility.