At rock bottom we have the machine itself, which we identify as an interpreter. We are not concerned about what language interpM is written in.
interpC_C, an interpreter of C written in C is probably not directly executable, but we can imagine the task of executing it by hand. Were it executed, it would be run with two arguments, a program (written in C) and an input to the program.
result = interpC_C prog_C data
To run a program on machine we use the machine-language
version of the program
result = interpM prog_M data
prog_M and prog_C are required to give
the same results when run on the same data.
The requirement implies an equivalence criterion:
interpM prog_M == interpC_C prog_C
prog_M = interpM compCtoM_M Prog_C
Substituting for interpM interpM compCtoM_M == interpC_CThe inner instance of interpM runs the compiler, the outer instance runs the resulting code.
One example of a program written in C is the compiler itself. It can be built for use on the machine thus:
compCtoM_M = interpM compCtoM_M compCtoM_C
compCCtoM_M = interpM compCtoM_C compCCtoM_C
Since CC is an extension of C, a compiler for it written in
C can equally well be regarded as being written in CC:
compCCtoM_CC = compCCtoM_C
To confirm that the compiler works we use it to
recompile itself.
compCCtoM_M' = interpM compCCtoM_M compCCtoM_CC
Although code compiled by the recompiled version (distinguished by the ' symbol)
should be the same as that compiled by the previous version, the
machine-language texts of the two versions themselves can't be expected
to be identical, because they were
made by different compilers (compCCtoM vs compCtoM). However,
if we do another round of recompilation
compCCtoM_M'' = interpM compCCtoM_M' compCCtoM_C
We expect the texts of the ' and '' versions to agree because they were made
by the same compiler (compCCtoM). However, one should be aware
that it's possible to hide magic in a compiler so it might,
for example, update an embedded version number upon
recompilation.
p1 y == p2 x0 y
For example the squaring function may be defined to
be a specialization of a general integer-power function
pow :: Int Double Double square = pow 2This naive definition, however, is not likely to be more efficient than calling the power function directly. To get a good square, we need the services of a specializer, called mix for historical reasons. Roughly speaking
square = mix pow 2
A program with two inputs would normally be run by an interpreter thus
result = interp prog data1 data2
(Our
origninal defintion of interpreter allowed only
one data argument. That could a tuple, which we
show here in curried form.)
The spcializer converts the program to a residual
resid = mix prog data1
We can run the residual thus
result = interp resid data2
The faithfulness criterion that the specializer
needs to respect is that either of these two
computations yields the same result, for every
possible data2.
result = interp prog data1 data2
result = interp resid data2
Upon replacing resid by its definition, and omitting
data2 from both sides, the faithfulness criterion becomes
interp prog data1 = interp (mix prog data1)We note that
mix prog
takes any chosen data1 and makes a version of
prog specialized for that datum. Thus we
may call mix prog
a particularized specializer.
prog' = mix interp prog
The specialized interpreter can be run thus
result = interp prog' data1 data2
This doesn't look like we've gotten anywhere: we now
run prog' instead of prog; and
prog' probably contains some Cheshire grin of
interp in addition to prog.
However, if we decorate the formula with what languages are involved at each stage, we see an interesting possibility. prog might have been in some new language L, while the interpreter is written in C, and the specializer is for C. Since prog' is in C, we can call it progC.
progC = mixC interpL_C progL
Notice that mixC interpL_C does exactly what a compiler
compLtoC_C would have done. Thus we have schematically
compLtoC_C == mixC interpL_CHowever, this compiler is fairly expensive--it requires the interpreter as a hidden input to every compiler run. An input that is present in every run, however, is itself a candidate for specialization.
mix mix interp
Here mix mix takes an interpreter as argument.
The residual program created by the outer
mix is a compiler because
mix interp is a compiler.
Hence mix mix is a maker of compilers, or
a compiler-compiler. It can make a compiler for
any language for which an interpreter exists, provided
the interpreter is written
in the language processed by mix. (In our
examples that language has been is C).
| first projection | mix prog| particularized specializer
| second projection | mix interp | compiler
| third projection | mix mix | compiler-compiler
| |