define(_,`dnl')_ unobtrusive dnl, used for readability
_
_ Pure macros as a programming language
_
_ m4 is Turing complete even when stripped to the bare minimum
_ of one builtin: `define'. This is not news; Christopher Strachey
_ demonstrated it in his ancestral GPM, described in "A general
_ purpose macrogenerator", The Computer Journal 8 (1965) 225-241.
_
_ This program illustrates the fact with macros for bit-by-bit
_ binary arithmetic and string comparison. Comments suggest how
_ the demonstrated techniques can be used to simulate a typical
_ computer. A largely functional programming style is supported
_ by some unusual m4 idioms:
_
_ 1. Case-switching via macro names constructed on the fly.
_ 2. Case-switching by redefining macros.
_ 3. Representing data structures by nested parenthesized lists.
_ 4. Using macros as associative memory.
_
_ Note. There is a `dnl' on every line of this program. The
_ indulgence of using this builtin is logically unnecessary.
_ However, stripping the program of `dnl's would defeat its
_ didactic purpose. The program would consist entirely of
_ `define's with no line breaks. Its behavior would be the
_ same; but its presentation would be incomprehensible.
_
_ A case switch (Idiom 1)
_
_ An essential feature of programming is conditional execution.
_ Suppose there are predicates that yield `T' or `F' for true
_ or false. A conditional switch of the form
_
_ if(,`', `')
_
_ constructs calls for `if_T, or `if_F' that select the
_ appropriate action:
_
define(if,`if_$1(`$2',`$3')')_
define(if_T,`$1')_
define(if_F,`$2')_
_
_ Example
_
_ if(T,`yes',`no') => if_T(`yes',`no') => yes
_ if(F,`yes',`no') => if_F(`yes',`no') => no
_
_ Basic Boolean functions are easy to define in terms of `if':
_
define(not,`if($1,`F',`T')')_
define(and,`if($1,`$2',`F')')_
define(or,`if($1,`T',`$2')')_
define(xor,`if($1,`not($2)',`$2')')_
_
_ List representation (Idiom 3)
_
_ In order to provide access to individual members, sequences
_ of data values may be represented as right-associated lists.
_ In particular, binary integers may be represented in this
_ manner:
_
_ (1,(0,(1,(1,()))))
_
_ To facilitate arithmetic algorithms, the presentation
_ is little-endian. In customary base-2 notation the
_ example becomes 1101 (decimal 13). An empty list `()' acts
_ as terminator. Non-significant 0's (leading zeros in
_ customary notation) are not allowed;
_ the value zero is represented by the empty list.
_
_ Macros as associative memory (Idiom 4)
_
_ Example
_
_ define(thirteen,`(1,(0,(1,(1,()))))')_
_
_ Individual list elements may be accessed by a pun, in which
_ the outer parentheses of a list are taken to be the
_ argument-list delimiters in a macro call. Two basic macros
_ use this scheme to extract "head" and "tail" parts,
_ and , from a nonempty list:
_
_ head((,)) ==>
_ tail((,)) ==>
_
define(head,`_head$1')_
define(_head,`$1')_
_
define(tail,`_tail$1')_
define(_tail,`$2')_
_
_ Example
_
_ head(thirteen) ==> _head(1,(0,(1,(1,())))) => 1
_ tail(thirteen) ==> _tail(1,(0,(1,(1,())))) => (0,(1,(1,()))
_
_ (In showing the progress of macro expansions => denotes a
_ single step; ==> denotes multiple steps.)
_
_ According to the rules of m4, `head' and `tail' also work on
_ the empty list, `()',in which case either function yields an
_ empty string, `'. This property of `head' will be exploited.
_
_ A digit-equality test, `eqd', is essential for arithmetic. It
_ switches on two arguments chosen from the set {`', `0', `1'}
_ and yields `T' or `F' according as they are or are not equal.
_ The empty digit `' arises from expressions such as `head(())'.
_ `eqd' switches on its two arguments in the same way that `if'
_ does on one argument. (Notice the empty digits in `eqd__'.)
_
define(eqd,`eqd_$1_$2()')_
define(eqd__,`T')_
define(eqd__0,`F')_
define(eqd__1,`F')_
_
define(eqd_0_,`F')_
define(eqd_0_0,`T')_
define(eqd_0_1,`F')_
_
define(eqd_1_,`F')_
define(eqd_1_0,`F')_
define(eqd_1_1,`T')_
_
_ Example
_
_ eqd(1,0) => eqd_1_0() => F
_
_ The () appended by `eqd' is defensive. If () were
_ omitted, then would mistakenly be eaten in a
_ rare juxtaposition like
_
_ eqd(1,0)() => eqd_1_0() => F
_
_ A common use of `eqd' that merits a name of its own is
_ a predicate for testing whether a number is zero:
_
define(iszero,`eqd(head($1),`')')_
_
_ Example
_
_ iszero(()) ==> T
_ iszero((0,(1,())) ==> F
_
_ Functions `if', `head', `tail', and `eqd' form a sufficient
_ basis for programming arithmetic on binary numbers. Here
_ is the successor function. Succ() yields +1.
_
define(Succ,`if(eqd(head($1),`'),`(1,())','_
``if(eqd(head($1),0),`(1,tail($1))',`(0,Succ(tail($1)))')')')_
_
_ (The name `Succ' is capitalized to distinguish it from a
_ different implementation that will soon be described. Right
_ and left quotes before and after `_' exclude it from
_ the replacement text of `Succ' by bringing it to top level
_ where it gets expanded and disappears.)
_
_ Example
_
_ Succ(Succ(Succ(()))) ==> (1,(1,()))
_
_ `Succ' expands in different ways depending on the head digit.
_ Each alternative is presented in a different context: as the
_ first branch of an `if', as the first branch of a nested `if',
_ and as the second alternative of a nested `if'. It would be
_ notionally cleaner to use a 3-way switch.
_
_ Unfortunately, overt switch code relies on Idiom 1, a glaring
_ deviation from the mainstream functional style employed in
_ `Succ'. Fortunately, a remedy is at hand: macros that hide
_ the idiom.
_
_ A general pattern for case-switching on the head of the
_ operand of a unary operation, , is
_
_ define(,`_(head($1))($1)')_
_ define<_,`_$1')_
_
_ Then () will expand to a call for a macro that
_ handles the case where head()=:
_
_ _()
_
_ The pattern is captured in the macro `cases1()'. (Note
_ that the definition of the auxiliary macro `_cases1' extends
_ over two lines.)
_
define(cases1,`_cases1(dol(1),`$1')')_
define(_cases1,`define($2,`_$2(head($1))($1)')'_
`define(_$2,`$2_$1')')_
define(dol,`$$1')_
_
_ The magic here is the "dollar macro",
_
_ dol(1) => $1
_
_ which results in `$1' being substituted for `$1' [sic] throughout
_ the replacement text of `_cases1', while is substituted for
_ `$2'. When is `succ', the expansion proceeds thus:
_
_ cases1(succ) => _cases1(dol(1),`succ') =>
_ define(succ,`_succ(head($1))($1)')define(_succ,`succ_$1')
_
_ Code for individual cases of the successor function remains
_ to be supplied.
_
cases1(succ)_
define(succ_,`(1,())')_
define(succ_0,`(1,tail($1))')_
define(succ_1,`(0,succ(tail($1)))')_
_
_ Example
_
_ succ((1,()) => _succ(1,())((1,())) => succ_1((1,())) =>
_ (0,succ(tail((1,())))) ==> (0,succ(()) ==> (0,(1,()))
_
_ Some small constants may be computed and saved for future
_ use (Idiom 4).
_
define(zero,())_
define(one,succ(zero))_
define(two,succ(one))_
_
_ For human consumption, a macro, `base2' will convert results
_ from lugubrious list form to ordinary binary notation.
_
define(base2,`if(iszero($1),`0',`_base2($1)')')_
cases1(_base2)_
define(_base2_,`')_
define(_base2_0,`_base2(tail($1))0')_
define(_base2_1,`_base2(tail($1))1')_
_
_ Example
_
_ base2(zero) ==> 0
_ base2(thirteen) ==> 1101
_
_ A counter based on `succ' may be held in a macro whose content is
_ updated by a macro, `incr'. or read out simply by expanding it.
_
define(incr,`define(`$1',succ($1))')_
_
_ Example
_
_ define(mycounter,())
_ incr(`mycounter')
_ incr(`mycounter')
_ base2(mycounter) ==> 10
_
_ Binary operations may be defined by switching on two
_ parameters. The switching code, generated by a macro
_ `cases2', is like that generated by `cases1', with two
_ calls of the dollar macro instead of one.
_
define(cases2,`_cases2(dol(1),dol(2),$1)')_
define(_cases2,`define($3,`_$3(head($1),head($2))($1,$2)')'_
`define(_$3,`$3_$1_$2')')_
_
_ Here `cases2' is used to define a Boolean operator via a
_ truth table:
_
cases2(nor)_
define(nor_F_F,`T')_
define(nor_F_T,`F')_
define(nor_T_F.`F')_
define(nor_T_T,`F')_
_
_ All the boilerplate in this 5-line definition could be
_ hidden in a trivial macro, to be used like this:
_
_ defbool(nand,T,T,T,F)_
_
_ For many basic arithmetic operations, it's convenient to
_ switch on only the first of two operands:
_
define(cases1of2,`_cases1of2(dol(1),dol(2),$1)')_
define(_cases1of2,`define($3,`_$3(head($1))($1,$2)')'_
`define(_$3,$3_$1)')_
_
_ Addition and multiplication are straightforward:
_
cases1of2(add)_
define(add_,`$2')_
define(add_0,`(head($2),add(tail($1),tail($2)))')_
define(add_1,`succ(add_0($1,$2))')_
_
define(mpy,`if(iszero($2),`zero',`_mpy($1,$2)')')_
cases1of2(_mpy)_
define(_mpy_,zero)_
define(_mpy_0,`(0,_mpy(tail($1),$2))')_
define(_mpy_1,`add($2,_mpy_0($1,$2))')_
_
_ Example
_
_ base2(mpy(add(one,two),two)) ==> 110
_
_ Switch generators follow a stereotyped pattern that
_ might be captured in a higher-level maker of switch
_ generators, to be used like this:
_
_ switchgen(,,)
_
_ where is a list of parameter numbers to
_ switch on, and is a list of all parameter numbers.
_ Then switch generators defined above could be written
_ without attention to their tricky implementation:
_
_ switchgen(cases1,(1,()),(1,()))')
_ switchgen(cases1of2,(1,()),(1,(2,())))')
_
_ Other data types
_
_ In principle a string-equality operator declared
_ by `cases2(eqs)' could be programmed to yield `T' or
_ `F' for examples like
_
_ eqs((W,(O,(R,(D,(1,()))))),(W,(O,(R,(D,(2,()))))))
_
_ but it would require a daunting N^2 = 3969 case macros,
_ where N counts uppercase, lowercase and numeric characters
_ plus the empty character. Fortunately there is a trick
_ that reduces the number of case macros to N = 63.
_
_ For every character define a case macro `eqc_' to
_ yield `F'. Here's a small sample:
_
define(eqc_A,`F')_
define(eqc_B,`F')_
define(eqc_C,`F')_
define(eqc_D,`F')_
_ ...
define(eqc_,`F')_
_
_ Then `eqc(,**)' redefines `eqc_****' to yield `T' and
_ calls `eqc_****' (Idiom 2). The result is `T' only if ****
_ and **** are the same. Finally `eqc' restores the definition
_ of `eqc_****'. The only subtlety in the definition of `eqc'
_ is the empty quotes that keep the result of `eqc_$2' from
_ being tacked onto `define':
_
define(eqc,`define(`eqc_$1',`T')eqc_$2`'define(`eqc_$1',`F')')_
_
_ Example
_
_ eqc(A,B) => define(`eqc_A',`T')eqc_B`'define(`eqc_A',`F')
_ => eqc_B`'define(`eqc_A',`F') => F`'define(`eqc_A',`F') => F
_
_ eqc(A,A) => define(`eqc_A',`T')eqc_A`'... ==> T
_
_ In terms of `eqc', string-comparison becomes
_
define(eqs,`if(not(eqc(head($1),head($2))),`F''_
``if(eqc(head($1),`'),`T',`eqs(tail($1),tail($2))')')')_
_
_ Comparing characters for alphabetic order is harder. It
_ could be done by a 3969-way switch. Or one might iterate
_ over the the alphabet to see which character comes first -
_ a time-consuming process. With the help of N macros that
_ tell the numerical position of each of the N characters in
_ the alphabet, character comparison becomes numeric
_ comparison, which takes O(log N) steps. I'd love to hear
_ any better ideas.
_
_ Universality
_
_ It's not hard to envision simulating a simple computer, except
_ for interactive input-out--a deficiency that is shared with
_ Turing machines. Thus, aside from resource limitations, bare
_ m4 is Turing complete.
_
_ Here's a workable plan:
_
_ The program counter is maintained by `incr', and converted
_ to string form by `base2'.
_
_ Word addresses are binary mumbers.
_
_ The contents of word n are held in a macro named W, where
_ is the string form of the address.
_
_ Words need not be limited in size. Numbers may be kept
_ in the form (sign,magnitude). (Caveat: the sign can't
_ be `+' or `-', because programs can only switch on
_ alphanumerics.)
_
_ Instructions may be of variable length--one word of opcode,
_ and following words that contain immediate operands or
_ addresses expressed in binary form.
_
_ Opcodes are a limited set of binary numbers , and may
_ be switched on by converting them to string form I.
_
_ An assignment redefines the designated word.
_
_ Other instructions are programmed in styles illustrated above.
_
_ Each instruction updates the location counter appropriately.
_
_ A control macro fetches an opcode according to the location
_ counter. If the opcode is a halt instruction, the control
_ terminates. Otherwise it calls the designated operation,
_ then calls the control recursively.
_
_ Advice for m4
_
_ Recursion of the control macro is unlimited and unavoidable.
_ Unfortunately few, if any, m4 implementations handle tail
_ calls so as not to grow the program stack. Doing so would
_ help this and other deeply recursive applications.
_
_ M. Douglas McIlroy
_ Dartmouth College
_ August 2020
_ January 2021, comments tweaked
**