The magic mix: interpreter, compiler, specializer

We are concerned with three kinds of language processor.
Compiler
A compiler, given a program in one language, say C, makes an equivalent program in another language, M (think machine language).
Interpreter
An interpreter, given a program and input data for that program, computes the program's result.
Specializer
A specializer, given a program with several inputs, some of which are known in advance, creates a (hopefully more efficient) equivalent program that requires only the remaining inputs.
As we shall see, a remarkable relationship exists among these three kinds of processor: a compiler is a specialized interpreter.

Notation

prog_C
a program written in C
prog_M
a program written in M (machine language)
interpC_C
interpreter for C written in C
interpC_M
interpreter for C written in M
compCtoM_C
compiler from C to M written in C
compCtoM_M
compiler from C to M written in M
We refer to C and machine language only for concreteness; in fact C and M could be any two Turing-complete languages.

At rock bottom we have the machine itself, which we identify as an interpreter. We are not concerned about what language interpM is written in.

interpM
interpreter for machine language

interpC_C, an interpreter of C written in C is probably not directly executable, but we can imagine the task of executing it by hand. Were it executed, it would be run with two arguments, a program (written in C) and an input to the program.

     result = interpC_C prog_C data
To run a program on machine we use the machine-language version of the program
     result = interpM prog_M data
prog_M and prog_C are required to give the same results when run on the same data. The requirement implies an equivalence criterion:
	interpM prog_M == interpC_C prog_C

Compiler

To run the compiler, we apply the machine to it and supply a program as input:
     prog_M = interpM compCtoM_M Prog_C
Substituting for prog_M in the equivalence criterion gives us a faithfulness criterion for the compiler:
	interpM interpM compCtoM_M == interpC_C
The inner instance of interpM runs the compiler, the outer instance runs the resulting code.

One example of a program written in C is the compiler itself. It can be built for use on the machine thus:

     compCtoM_M = interpM compCtoM_M compCtoM_C

Bootstrapping

We can extend the language C to CC (think C++) by modifying the compiler's source code compCtoM_C to make compCCtoM_C. The machine-executable version is built thus:
     compCCtoM_M = interpM compCtoM_C compCCtoM_C
Since CC is an extension of C, a compiler for it written in C can equally well be regarded as being written in CC:
     compCCtoM_CC = compCCtoM_C
To confirm that the compiler works we use it to recompile itself.
     compCCtoM_M' = interpM compCCtoM_M compCCtoM_CC
Although code compiled by the recompiled version (distinguished by the ' symbol) should be the same as that compiled by the previous version, the machine-language texts of the two versions themselves can't be expected to be identical, because they were made by different compilers (compCCtoM vs compCtoM). However, if we do another round of recompilation
     compCCtoM_M'' = interpM compCCtoM_M' compCCtoM_C
We expect the texts of the ' and '' versions to agree because they were made by the same compiler (compCCtoM). However, one should be aware that it's possible to hide magic in a compiler so it might, for example, update an embedded version number upon recompilation.

Specializer

Suppose we know we are going to use a program p2 x y of two arguments many times with argument x always having the value x0. We may use a specializer to create a residual program p1 y such that
     p1 y == p2 x0 y
For example the squaring function may be defined to be a specialization of a general integer-power function
	pow :: Int Double Double
	square = pow 2
This naive definition, however, is not likely to be more efficient than calling the power function directly. To get a good square, we need the services of a specializer, called mix for historical reasons. Roughly speaking
      square = mix pow 2
mix needs to work on the implementation of pow, not merely arrange to call it. For convenience, we assume it works on source code. For further simplicity we consider every program to be in C. To execute such a program, we feed it and its data to interpC_C, which we abbreviate to interp.

A program with two inputs would normally be run by an interpreter thus

      result = interp prog data1 data2
(Our origninal defintion of interpreter allowed only one data argument. That could a tuple, which we show here in curried form.)

The spcializer converts the program to a residual

      resid = mix prog data1
We can run the residual thus
      result = interp resid data2
The faithfulness criterion that the specializer needs to respect is that either of these two computations yields the same result, for every possible data2.
      result = interp prog data1 data2
      result = interp resid data2
Upon replacing resid by its definition, and omitting data2 from both sides, the faithfulness criterion becomes
	interp prog data1 = interp (mix prog data1)
We note that
      mix prog
takes any chosen data1 and makes a version of prog specialized for that datum. Thus we may call mix prog a particularized specializer.

Compiler as specialized interpreter

Since interp is a program with two inputs (the program to be interpreted and its data) we can specialize it for a given program
      prog' = mix interp prog
The specialized interpreter can be run thus
      result = interp prog' data1 data2
This doesn't look like we've gotten anywhere: we now run prog' instead of prog; and prog' probably contains some Cheshire grin of interp in addition to prog.

However, if we decorate the formula with what languages are involved at each stage, we see an interesting possibility. prog might have been in some new language L, while the interpreter is written in C, and the specializer is for C. Since prog' is in C, we can call it progC.

      progC = mixC interpL_C progL
Notice that mixC interpL_C does exactly what a compiler compLtoC_C would have done. Thus we have schematically
	compLtoC_C == mixC interpL_C
However, this compiler is fairly expensive--it requires the interpreter as a hidden input to every compiler run. An input that is present in every run, however, is itself a candidate for specialization.

Compiling the compiler

If we want to write all programs in one language, interp doesn't need to be fed to mix at every run. It can instead be specialized away by applying mix to the compiler mix interp:
      mix mix interp
Here mix mix takes an interpreter as argument. The residual program created by the outer mix is a compiler because mix interp is a compiler. Hence mix mix is a maker of compilers, or a compiler-compiler. It can make a compiler for any language for which an interpreter exists, provided the interpreter is written in the language processed by mix. (In our examples that language has been is C).

Summary

We have now seen all three of the famous Futamura projections:
first projectionmix progparticularized specializer
second projectionmix interpcompiler
third projectionmix mixcompiler-compiler
The view of compilation as "partial evaluation" was expounded by Ershov in the mid 1960s. Futamura formulated the idea more precisely, in terms of specialization, about 1970. Ershov also spoke of "mixed computation", wherein some of the work of interpretation is done at compile time from source language, and some at run time from the residual machine language; whence the name mix.