File Interpreters.html    Author McKeeman    Copyright © 2007    index

Interpreters and Compilers

Source Interpretation

It is possible to obey a program immediately, without ever producing an intermediate form. The HOC calculator and Cleve Moler's research version of MATLAB are examples. Direct source interpretation is competitive if there are no loops and recursions in the source code. The reason is that the compiler has to execute at least once anyway, so there is no extra translation cost in direct interpretation. Any interpreter is reasonably efficient if the interpreted actions take much longer than the interpreter overhead, as was the case of the original MATLAB which was targetted primarily at matrix operations.

Tree interpretation

The X language, however, does have loops and recursion and no big operations. This implies that translation to an efficiently executable code will pay off. The choice for such an executable code is the (tree + symbol-table) intermediate representation. The tree-walker can execute the language instead of making code. The symbol table needs an extra field for the value of the variable; the symbol table and run-time frame are thus combined. Loops just cause the loop sub-tree to be rewalked. MATLAB has used a tree interpreter in some toolboxes for many releases.

Pcode interpretation

The most common interpreter is modelled after the Pascal P-code. In this case the front-end emits an instruction stream that looks like machine language, including push, pop and branch instructions. The Burrough B5000 (1961) was a hardware implementation of such an interpreter and served as a model for the Pascal simulation (1970). The original Java implementation used a byte-code interpreter for a stack machine.

The fastest Pcode implementation is based directly on the shift/reduce sequence (which is already in execution order for expressions). The branching logic (if-fi, do-od) requires stack-like data structures to save fixup points. hyper (included in this distribution) is an example of a shift-reduce based compiler (not an interpreter). If one accepts the cost of building a tree, it is easier to handle the symbol table and branching logic.

Typically a Pcode interpreter runs about 40x slower than compiled code. The reason is the the software fetch-execute mechanism simulates the hardware instruction fetch-execute mechanisms. It takes about 40 instructions per operation to simulate the hardware. The carefully coded Java JVM was about 25x slower than compiled code, partly because Java is hard to compile efficiently. If the operations are on big data (such as arrays), then the 40-instruction overhead is amortized. The second-generation MATLAB interpreter (Bangert) worked this way; the assumption was that most operations were on matrices and therefore the software fetch-execute mechanism did not a seriously degrade performance. Obviously the MathWorks' customers agreed.

One advantage of an interpreter solution is that other things, such as debuggers and threads, are easier to implement.

JIT

The mnemonic JIT stands for just-in-time. The origin of the term is industrial practice. If a needed part is delivered to an assembly work station just in time for it to be installed, the industry profits from reduced inventory and storage costs. In the compiler world, the term implies that the source code is turned into executable form just in time to be executed. Actual implementations of JITs are just-too-late, since the need for the code is discovered when trying to execute it. The customer has to wait for the code to be translated.

The tradeoffs for a JIT are therefore

  • very fast translation, and therefore
  • no serious optimization, and therefore
  • reduced quality (slower run-time execution).

Suppose that each simulated interpreter instruction were implemented as a function call. The form of the interpreter would be a loop containing a switch, and each case of the switch would be a single function call. The simplest JIT instead compiles the same function calls in the order of the interpretable instructions (same effect). This eliminates the interpretable instructions, the loop and the switch, with the obvious performance improvement. The only tricky part is branch instructions, which can no longer be implemented as function calls, but rather must be implemented directly in the hardware branch logic.

Examination of the resulting JIT output leads immediately to the conclusion that some of the subroutine calls can be inlined, again improving speed. This trick can only go so far without introducing code-bloat as more and larger subroutines are inlined. The implementor chooses the sweet-spot and makes a mixture of calls and inlined code. This is an excellent tradeoff when turnaround and flexibility are more important than large-scale computations. One can push the JIT to make ever better code, but only at the cost of slowing the translation. Again, the implementor must choose a sweet-spot, avoiding heavy-duty optimization.

Hybrid JIT-interpreter

There is no reason that execution has to be all-JIT or all-interpreter. The translation mechanism can leave big-operand operators to the interpreter and put small-operand operators into compiled code. Switching back and forth between JIT and interpreter itself introduces some complexity, so again the implementor needs to pick a sweet-spot. Often a JIT takes Pcode input instead of working directly from the source.

Implementation

An interpreted language can be as complicated as required. That is, the concept of interpreter scales up. Here is a particular implementation in C for a language much richer than X.

A General Operand

There can be many types.

bool int8 int16 int32 int64 intv rational real32 real64 real128 realv complex ASCII string UTF8 string pointer

The type of a variable is dynamic, changing as in MATLAB, with each assignment. There are no declarations. Rather than compile this new X, we want to interpret it. There is a C type Operand which is used for all operands. Some, as will be seen, require attached malloc-d store. Assume a 32-bit hardware (size_t(&a))== 4).

Some of the types above are obvious, some need explanation. Type intv is a variable sized integer (represented as a sequence of words). Type realv is a float with variable amounts of accuracy.

Suppose we have an enum type

typedef enum {
    ERROR,
    BOOL,
    INT8,
    INT16,
    INT32,
    INT64,
    INTV,
    RATIONAL,
    REAL32,
    REAL64,
    REAL128,
    REALV,
    COMPLEX,
    ASCII,
    UTF8,
    VECTOR,
    REF
} DynamicType;

and a struct type

    typedef struct {
        union {
            int8_t  error;             /* like NaN             */
            int8_t  bool;              /* 0/1 true/false       */
            int8_t  int8;              /* int,  8 bits         */
            int16_t int16;             /* int, 16 bits         */
            int32_t int32;             /* int, 32 bits         */
            int64_t int64;             /* int, 32 bits         */
            struct {
                int32_t len;           /* negative len is sign */
                uint32_t *digits;      /* alloc-d              */
            } intv;
            float  real32;             /* real, 32 bits        */
            double real64;             /* real, 64 bits        */
            long double real128;       /* real, 128 bits       */
            struct {                   /* real, varying length */
                int32_t len;           /* negative len is sign */
                int32_t exp;           /* 2^exp factor         */
                uint32_t *digits;      /* alloc-d              */
            } realv;                   /* 2^31 32-bit words    */
            struct {                   /* rational             */ 
                Operand *num;
                Operand *denom;
            } rational;
            struct {                   /* complex              */ 
                Operand *rp;
                Operand *ip;
            } complex;
            char *ascii;               /* zero terminated      */
            char *utf8;                /* zero terminated      */
            struct {                   /* vector               */
                int32_t len;
                Operand *elem;
            } vector;
            Operand *ref;              /* lhs and pointers     */
        } data;
        DynamicType type;              /* selector             */
    } Operand;

An object of type Operand stores any of the fixed size types in 128+8 bits and uses pointers to allocated storage for the rest.

Operations

From the interpreter viewpoint, assignment execution is straightforward. There are functions

  • Operand zary(int op);
  • Operand unary(int op, Operand opd);
  • Operand binary(int op, Operand lft, Operand rgt);
  • void store(Operand *dest, Operand opd);
  • Operand pop(void);
  • void push(int offset);
  • void pushref(int offset);

Operators are implemented as functions with Operand inputs and output. The functions examine field type in their operand(s) and call the appropriate C routines to construct the result.

Among the unary ops are the type names, used as casts which force any other type into the named format if possible, and error otherwise. The unary push is used for variables; constants are just initialized variables. The unary pushref puts a pointer on the stack (used for assignment). The zary pop is only used for cleanup after an error. The interpreter has to call all these functions (and no others) appropriately to do assignments.

Type intv is implemented by carrying the long integer as an array of word-size super-digits. As in normal decimal algorithms, arithmetic on two super-digits may overflow or underflow, carrying a bit to the digits to the left or right. A free implementation of this kind is available in the BIGNUM package.

Type realv is implemented analogously to hardware float. There is a signed int exponent, a significand consisting of an array of unsigned ints, and a signed int length. The leading word and trailing bit of the significand are never 0. The value of the realv is the catenation of the values in the significand as a long integer, times the sign of the length field, times 2 raised to the exponent power. (Test your understanding: how is zero represented?)

The implementation of realv '+' will have to add or subtract pairs of unsigned ints and detect overflow -- that is: a+b>UINT_MAX. You can write a>UINT_MAX-b in C to get a safe form of the check. Arbitrary precision arithmetic carries the need for a shortening convention or operator, otherwise the length continues to grow during computation. The user or system must apply a policy for limiting precision of results.

Types complex and rational are represented by pairs of operands in the expected way. Because the size of Operand would otherwise be unlimited (and therefore not C), the components of rational and complex are in malloc-d storage. Rational arithmetic in fact demands integer components and is always reduced by the gcd after every operation.

Type utf8 gets all the worlds languages into computers. It is worth separating it from type ascii because the extra generality of Unicode slows down execution for string operations.

The main work is in implementing the N^2 combinations of types in binary(op, lft, rgt). One might choose to cast the smaller operand up to the larger operand or forbid mixed operations. Then only N actual operations have to be implemented. One might (as in Java) interpret "+" as catenate for ASCII and UTF8 or alternatively one might convert text to an arithmetic format if possible as in tcl, awk and javascript.

The arithmetic is driven by ops ZARY (no operands), UNARY, BINARY and STORE with subops following the main ops. Operands are addressed with hardware type int giving an offset into the local store.

Suppose some variable a (of whatever type) is stored at offset 3 in the execution frame. Then the following pcode evaluates a=a+a.

UNARY pushref 3
UNARY push 3
UNARY push 3
BINARY add
BINARY store

The main loop of the interpreters looks like:

while (running) {
  switch (*pc++) {
  case ZARY:
    zary(*pc++);
    break;
  case UNARY:
    unary(*pc++);
    break;
  case BINARY:
    binary(*pc++);
    break;
  case GOTO:
    pc = pc+*pc;
  case HALT:
    running = 0;
  default:
    illegalop();
  }
}