define(`_',`dnl')_ unobtrusive dnl, used for readability define(`__',`')_ indentation macros, also for readability define(`____',`')_ think of them as spaces that don't define(`______',`')_ stick around and get in the way _ _ Pure macros as a programming language _ _ m4 is Turing complete even when stripped to the bare minimum _ of one builtin: `define'. This is not news; Christopher Strachey _ demonstrated it in his ancestral GPM, described in "A general _ purpose macrogenerator", The Computer Journal 8 (1965) 225-241. _ _ This m4 program illustrates the fact with macros for computing _ with binary integers and outlines how to use the macros to _ simulate a typical digital computer. The macros realize a _ functional programming style by exploiting idioms that are _ rare or nonexistent in mainstream programming languages. _ _ 1. Case-switching via macro names constructed on the fly. _ 2. Equality testing by redefining macros. _ 3. Representing data structures by nested parenthesized lists. _ 4. Using macros as associative memory. _ 5. Inserting parameter symbols, $1 etc, into nested calls _ of `define' on the fly. _ _ Note. There is a `dnl' on every line of this program. The _ indulgence of using this builtin is logically unnecessary. _ Stripped of `dnl's and indentation macros, the program would _ consist entirely of `define's with no comments or line breaks. _ It would be about 1/5 the current size and would yield the _ same results, but would be inscrutable. _ _ In examples of stepwise macro expansion, all occurrences of _ `dnl' and indentation have been elided. _ _ For consistency, we quote all arguments of `define' unless _ they contain macro calls that must be expanded at the _ time of definition. _ _ Case switch (Idiom 1) _ _ An essential feature of programming is conditional execution. _ Suppose there are predicates that yield `T' or `F' for true _ or false. A conditional switch of the form _ _ if(,`', `') _ _ constructs calls for `if_T', or `if_F' that select the _ appropriate action: _ define(`if', `if_$1(`$2',`$3')')_ define(`if_T', `$1')_ define(`if_F', `$2')_ _ _ Example _ _ if(T,`yes',`no') => if_T(`yes',`no') => yes _ if(F,`yes',`no') => if_F(`yes',`no') => no _ _ (An arrow => signifies a single macro-expansion step.) _ _ Again for consistency, we quote the second and third arguments _ of `if' unless necessity dictates otherwise. _ _ Basic Boolean functions are easy to define in terms of `if': _ define(`not', `if(`$1', `F', `T')')_ define(`and', `if(`$1', `$2', `F')')_ define(`or', `if(`$1', `T', `$2')')_ _ _ List representation (Idiom 3) _ _ In order to provide access to individual members, sequences _ of data values may be represented as right-associated lists. _ In particular, binary integers may be represented in this _ manner: _ _ (1,(0,(1,(1,())))) _ _ To facilitate arithmetic algorithms, the presentation _ is little-endian. In customary base-2 notation the _ example becomes 1101 (decimal 13). An empty list `()' acts _ as terminator. Non-significant 0's (leading zeros in _ customary notation) are not allowed. Zero is represented _ by the empty list, `()'. Other values end in (1,()). _ _ Macros as associative memory (Idiom 3) _ _ Data may be saved under arbitrary names.Here we give _ more readable names for some small numbers. Quotes have _ been omitted from the definiens for `two' so that `one' _ gets expanded at definition time rather than at each _ call of `two', paying some space to save time. _ _ define(`zero', `()')_ define(`one', `(1,())')_ define(`two', (0,one))_ _ _ Individual list elements may be accessed by a pun, in which _ the outer parentheses of a list are taken to be the _ argument-list delimiters in a macro call. Two basic macros _ use this scheme to extract "head" and "tail" parts, _ and , from a nonempty list: _ _ head((,)) ==> _ tail((,)) ==> _ _ (A long arrow ==> signifies more than one macro expansion.) _ define(`head', `_head$1')_ define(`_head', `$1')_ _ define(`tail', `_tail$1')_ define(`_tail', `$2')_ _ _ Example _ _ head(two) ==> _head(0,(1,())) => 0 _ tail(two) ==> _tail(0,(1,())) => (1,()) _ _ According to the rules of m4, `head' and `tail' also work on _ the empty list, `()', in which case both functions yield an _ empty string, `'. This behavior will be exploited. _ _ Testing equality of primitive values (Idiom 2) _ _ The `if' macro checks which of two possible values (`T' or `F') _ appears as its first argument. More trickery is required to _ check equality of arbitrary primitive values--sequences of _ alphanumerics and underscores. We call such a (possibly empty) _ sequence an "atom". _ _ We first define the outcome of the test of the given atoms to _ be false. Then we define the test of one of the atoms against _ itself to be true. If the two values are the same, the second _ definition replaces the first. Finally we expand the current _ current definition of the original test. _ define(`eq',_ __`define(`eq_$1_$2', `F')'_ __`define(`eq_$1_$1', `T')'_ __`eq_$1_$2()')_ _ _ Examples _ _ eq(a,b) => define(`eq_a_b',`F')define(`eq_a_a',`T')eq_a_b() => F _ eq(a,a) => define(`eq_a_a',`F')define(`eq_a_a',`T')eq_a_a() => T _ _ Unfortunately the temporary definitions stick around as waste _ memory. We could relax our notion of purity, and use the _ builtin macro `undefine' to clean up. _ _ The final () in the definiton of `eq' is defensive. Without _ (), text can mistakenly be tacked onto the expansion of _ `eq_$1_$2'. _ _ eq(a,b)c ==> eq_a_bc _ _ A test for the empty list: _ define(`isempty', `eq(head($1),`')')_ _ _ Equality of lists of nonempty atoms can be tested by `eql', _ which uses `eq' recursively. (The indentation in this _ definition and others like it is intended to imitate a _ sequence of else-less if's.) _ define(`eql',_ __`if(isempty($1), `isempty($2)',_ __`if(eq(head($1),head($2)), `eql(tail($1),tail($2))',_ __`F')')')_ _ _ Example _ _ eql((W,(O,(R,(D,())))), (W,(O,(R,(D,(Y,())))))) _ ==> if(eq(W,W),`if(isempty(...),...)','F') _ ==> if(isempty(...),`T',`eql(tail(...),tail(...))') _ ==> eql((O,(R,(D,()))), (O,(R,(D,(Y,()))))) _ ==> eql(()),(Y,())) _ ==> F _ _ Here we use `eq' to define a predicate for testing whether _ a number, possibly with insignificant zero digits, is zero. _ define(`iszero',_ __`if(isempty($1), `T',_ __`if(eq(head($1),0), `iszero(tail($1))',_ __`F')')')_ _ _ Example _ _ iszero(()) _ ==> if(eq(,),`T',`if(...)') _ ==> if(T,`T','if(...)' ==> T _ _ iszero((0,(1,()))) _ ==> if(eq(0,),...,`if(eq(head((0,(1,()))),`0'),...)') _ ==> iszero(tail(0,(1,()))) _ ==> iszero((1,())) ==> F _ define(`succ',_ __`if(iszero($1),_ ____`one',_ ____`if(eq(head($1),0),_ ______`(1,tail($1))',_ ______`(0,succ(tail($1)))')')')_ _ define(`three', `succ(two)')_ define(`four', `succ(three)')_ _ _ In the definition of `succ', the behavior for each of the _ three possible values of the head digit of the argument _ (empty, `0' and `1') is presented in a different context: _ as the first branch of an `if', as the first branch of _ a nested `if', and as the second branch of a nested `if'. _ Parentheses become hard to match.It would be cleaner to _ use a 3-way switch. _ _ Case-switching on attributes of arguments _ _ Unfortunately, overt switch code relies on Idiom 1, a _ glaring deviation from the mainstream functional style _ used to define `iszero' and `succ'. A remedy is at hand: _ higher-level switch-defining macros that hide Idiom 1. _ _ Switching is based on one or more "decision attributes", _ atomic values derived from the arguments. The type of _ a switch depends on the numbers of decision attributes _ and parameters. _ _ switch decision parameters _ type attributes _ mkswitch1of1 1 1 _ mkswitch1of2 1 2 _ mkswitch2of2 2 2 _ _ Other switch types are named similarly. _ _ Warning. The discussion from here through the definition _ of `mkswitch1of1' may be heavy going. You may wish to skim _ it then skip around to bring the whole into focus. _ _ Each switch declaration defines a function and the _ decision attributes expressed as they will appear _ in the body of the macro to be declared. A switch _ declaration must be accompanied by "case macros" _ for all valid sets of values of the decision _ attributes. _ _ As an indicator of its purpose we declare a case _ macro using `case', a distinctive synonym for `define'. _ define(`case', `define(`$1', `$2')')_ _ _ The name of a case macro comprises the name of the _ original function and values of the decison _ attributes separated by underscores. For example, _ to define `iszero' in switch style, the decision _ attribute is the head of the argument; and we must _ define cases, `iszero_', `iszero_0' and `iszero_1' _ for the three possible values of the attribute. We _ can also define `if' in this style. _ _ mkswitch1of1('iszero',`head($1)')_ _ case(`iszero_', `T')_ _ case('iszero_0', `iszero(tail($1))')_ _ case(`iszero_1', `F')_ _ _ mkswitch1of3(`if',`$1')_ _ case(`if_T', `$2')_ _ case(`if_F', `$3')_ _ _ _ In the definition of the switch-declaring macro, _ `mkswitch1of1', $1 is the name of the function to be _ declared, notionally like this _ _ define(`mkswitch1of1',_ _ __`define(`$1', casename(`$1',attr)(arg))') _ _ where `casename' pastes together the function name and the _ value, `attr', of the decision attribute. The constructed _ name will finally be called with argument `arg'. When $1 _ is `iszero', `attr', will be `head($1)'. and `arg' will be $1. _ Unfortunately $1 already means `iszero'. To resolve the _ clash, we denote the first argument of function $1 as _ `DOL(1)'. This "dollar macro" will be replaced by $1 when _ the inner `define' is called. (Idiom 5). _ define(`DOL',`$$1')_ _ _ Because `DOL()' must be expanded to $ when the inner _ `define' is executed, `DOL()' must be exempt from the _ normal quotes around the body of a `define'. Thus quoting _ must end just before the dollar macro and resume just _ after, so a dollar-macro call typically is flanked by _ outward-facing quotes. _ _ 'DOL()` _ _ As the set of decision atttributes differs among switch type, _ `casename' needs multiple realizations. We call the version _ for switching on just one decision attribute `_mkswitch1'. _ define(`mkswitch1of1',_ __`define(`$1', `_mkswitch1(`$1',$2)('DOL(1)`)')')_ define(`_mkswitch1', `$1_$2')_ _ _ Here is the successor function programmed in switch style. _ mkswitch1of1(`succ', `head($1)')_ case(`succ_', `(1,())')_ case(`succ_0', `(1,tail($1))')_ case(`succ_1', `(0,succ(tail($1)))')_ _ _ Examples _ _ mkswitch1of1(`succ', `head($1)') _ ==> define(`succ', `_mkswitch1(`succ',head($1))($1)') _ _ succ((1,())) _ => _mkswitch1(`succ', head((1,()))((1,()))) _ ==> succ_1((1,())) _ => (0,succ(tail(1,()))) _ ==> (0,succ(())) _ ==> (0,succ_(())) _ => (0,(1,())) _ _ Other combinations of parameter counts and switching _ arguments are need for defining ordinary arithmetic _ operators. The following definitions are staightforward _ variations on `mkswitch1of1'. _ define(`mkswitch2of2',_ __`define(`$1',_ ____`_mkswitch2(`$1',$2,$3)('DOL(1)`,'DOL(2)`)')')_ define(`_mkswitch2', `$1_$2_$3')_ _ define(`mkswitch1of2',_ __`define(`$1',_ ____`_mkswitch1(`$1',$2)('DOL(1)`,'DOL(2)`)')')_ _ define(`mkswitch1of3',_ __`define(`$1',_ ____`_mkswitch1(`$1',$2)('DOL(1)`,'DOL(2)`,'DOL(3)`)')')_ _ define(`mkswitch1of4',_ __`define(`$1',_ ____`_mkswitch1(`$1',$2)('DOL(1)`,'DOL(2)`,'DOL(3)`,'DOL(4)`)')')_ _ _ Here we define some basic arithmetic functions. For _ illustrative purposes, addition is defined by switching _ on both arguments, while multiplication is defined by _ switching on only the first argument. Both definitions _ can produce insignificant zero bits when inputs have _ insignificant zero bits. _ mkswitch2of2(`add', `head($1)',`head($2)')_ case(`add__', `()')_ case(`add__0', `$2')_ case(`add__1', `$2')_ case(`add_0_', `$1')_ case(`add_0_0', `(0,_addtails($1,$2))')_ case(`add_0_1', `(1,_addtails($1,$2))')_ case(`add_1_', `$1')_ case(`add_1_0', `(1,_addtails($1,$2))')_ case(`add_1_1', `(0,succ(_addtails($1,$2)))')_ define(`_addtails', `add(tail($1),tail($2))')_ _ define(`mul', `if(iszero($2), `()', `_mul($1,$2)')')_ mkswitch1of2(`_mul', `head($1)')_ case(`_mul_', `()')_ case(`_mul_0', `(0,_mul(tail($1),$2))')_ case(`_mul_1', `add($2,_mul_0($1,$2))')_ _ _ Miscellaneous useful functions _ _ The `trim' function deletes insignifcant zero bits. _ define(`trim',_ __`if(eq(head($1),`'), `()',_ __`_trim(head($1),trim(tail($1)))')')_ define(`_trim',_ __`if(and(eq($1,0),isempty($2)), `()',_ __`($1,$2)')')_ _ _ For human consumption, `base2' converts results from _ lugubrious list form to ordinary binary notation. _ define(`base2',_ __`if(eq(head($1),`'), `0', `_base2($1)')')_ define(`_base2',_ __`if(eq(head($1),`'), `', `_base2(tail($1))head($1)')')_ _ _ A counter value kept in a macro (per Idiom 3) may be _ updated by `incr' or read out by expanding it. _ define(incr, `define(`$1',succ($1))')_ _ _ Examples _ _ base2((0,(1,(0,())))) ==> 010 _ base2(trim((0,(1,(0,()))))) _ ==> base2((0,(1,()))) ==> 10 _ _ define(mycounter,zero) _ incr(`mycounter') _ incr(`mycounter') _ base2(mycounter) ==> 10 _ _ Comparison functions _ _ Numerical comparison operators may be built on `compare', _ a function that yields `LT', `EQ', or `GT' according as _ its first argument is less than, equal to, or greater _ than the second. The auxiliary function `_compare' breaks _ ties between tails. _ mkswitch2of2(`compare', `head($1)',`head($2)')_ case(`compare__', `EQ')_ case(`compare__0', `if(iszero($2), `EQ',`LT')')_ case(`compare__1', `LT')_ case(`compare_0_', `if(iszero($1), `EQ', `GT')')_ case(`compare_0_0', `compare(tail($1),tail($2))')_ case(`compare_0_1', `_compare(compare(tail($1),tail($2)), LT)')_ case(`compare_1_', `GT')_ case(`compare_1_0', `_compare(compare(tail($1),tail($2)), GT)')_ case(`compare_1_1', `compare(tail($1),tail($2))')_ _ define(`_compare', `if(eq($1,EQ), `$2', `$1')')_ _ _ The customary two-way comparison predicates can be defined _ in terms of `compare'. A typical example is the "less than" _ predicate, `lt'. _ mkswitch1of2(`lt',`compare($1,$2)')_ case(`lt_LT', `T')_ case(`lt_EQ', `F')_ case(`lt_GT', `F')_ _ _ To avoid the tedium of six such declarations, we _ capture their pattern in `mkcomp'. `mkcomp' is a _ third-compare function; it defines definers of _ numerical comparison predicates. _ define(`mkcomp',_ __`mkswitch1of2(`$1',`compare('DOL(1)`,'DOL(2)`)')'_ __`case(`$1_LT', `$2')'_ __`case(`$1_EQ', `$3')'_ __`case(`$1_GT', `$4')')_ _ mkcomp(`lt', `T', `F', `F')_ mkcomp(`le', `T', `T', `F')_ mkcomp(`et', `F', `T', `F')_ equal to (not the same as `eq') mkcomp(`ne', `T', `F', `T')_ mkcomp(`ge', `F', `T', `T')_ mkcomp(`gt', `F', `F', `T')_ _ _ Comparing atoms for, say, alphabetic order is not a _ simple task; we need somehow to decide the question _ consistently for every possible pair of values. The _ function `cmp' does so for single alphanumerics by _ finding which member of the pair comes first in a _ list of the alphabet. _ define(`cmp',_ __`if(eq(`$1',`$2'), `EQ',_ __`_cmp(`$1',`$2',head(alphabet),tail(alphabet))')')_ _ define(`_cmp',_ __`if(eq(`$1', `$3'), `LT',_ __`if(eq(`$2', `$3'), `GT',_ __`if(eql(`$4', `()'), `?',_ __`_cmp(`$1',`$2',head(`$4'),tail(`$4'))')')')')_ _ define(`alphabet',_ `(`0',(`1',(`2',(`3',(`4',(`5',(`6',(`7',(`8',(`9',_ (`A',(`a',(`B',(`b',(`C',(`c',(`D',(`d',(`E',(`e',_ (`F',(`f',(`G',(`g',(`H',(`h',(`I',(`i',(`J',(`j',_ (`K',(`k',(`L',(`l',(`M',(`m',(`N',(`n',(`O',(`o',_ (`P',(`p',(`Q',(`q',(`R',(`r',(`S',(`s',(`T',(`t',_ (`U',(`u',(`V',(`v',(`W',(`w',(`X',(`x',(`Y',(`y',_ (`Z',(`z',()))))))))))))))))))))))))))))))))))))))_ ))))))))))))))))))))))))')_ _ _ `alphabet' is fragile. The item quotes disappear _ when tails are copied repeatedly into arguments of _ `cmp' and `_cmp'. Then macros with one-character _ names will alter the list and ruin comparisons. _ _ `cmp' is costly in time and space. Iteration over the _ alphabet involves copying a tail of the alphabet _ N times, for a total volume of O(N^2) list elements, _ where N is the size of the alphabet. Up to N^2 ghost _ definitions can accumulate over many executions. _ _ Ghost definitions are not only numerous; they are _ repeatedy redefined. If all alphanumeric pairs were _ equally likely, an average of about 120 redefinitions _ and 60 copies of 45-element alphabet tails would occur _ for each execution of `cmp'. _ _ The overheads of redefinition and argument-copying can _ be avoided by using an N^2-way switch instead of linear _ search in the alphabet. We automate the construction _ of the N^2 case declarations with `mkcmp' and a few _ helper functions defined below. _ _ Making the case macros all at once will in the long _ run beat making a lot of ghost definitions for every _ character comparison. Moreover, the noted fragility _ of `alphabet' may be mitigated, for no macros defined _ in this program will use it once the case macros _ have been defined. _ mkswitch2of2(`cmp',`$1', `$2')_ _ _ `mkcmp0' makes comparison functions for $1 against _ $2, which is known to follow $1 in alphabetic order _ define(`_mkcmp0',_ __`case(`cmp_$1_$2',`LT')'_ __`case(`cmp_$2_$1',`GT')')_ _ _ `_mkcmp1' makes comparison functions for $1 against all _ characters in list $2. _ define(`_mkcmp1',`if(eql($2,()),`',_ __`_mkcmp0($1,head($2))'_ __`_mkcmp1($1,tail($2))')')_ _ _ `_mkcmp2' makes comparison functions for all pairs _ constructible from $1 and alphabetically following _ characters, which are listed in $2. _ define(`_mkcmp2',_ __`case(`cmp_$1_$1',`EQ')'_ __`if(eql($2,()),`',_ ____`_mkcmp1($1,$2)'_ ____`_mkcmp2(head($2),tail($2))')')_ _ _ `mkcmp' makes comparison functions for every pair of _ (not necessarily distinct) characters chosen from _ the alphabet. _ define(`mkcmp',`_mkcmp2(head($1),tail($1))')_ _ mkcmp(alphabet)_ _ _ Strings can be defined as lists of characters, and _ compared by a list-comparison function, `cmpl'. _ define(`cmpl',_ __`if(and(isempty($1),isempty($)), `EQ',_ __`if(isempty($1), `LT',_ __`if(isempty($2), `GT',_ __`_cmpl($1,$2)')')')')_ _ mkswitch1of2(`_cmpl',`cmp(head($1),head($2))')_ case(`_cmpl_LT', `LT')_ case(`_cmpl_EQ', `EQ', `cmpl(tail($1),tail($2))')_ case(`_cmpl_GT', `GT')_ _ _ Example _ _ cmpl((A,(N,())), (A,(M,()))) _ ==> _cmpl((A,(N,())), (A,(M,()))) _ ==> mkswitch1(`_cmpl', cmp(A,A))((A,(N,())), (A,(M,()))) _ ==> _cmpl_EQ((A,(N,())), (A,(M,()))) _ ==> cmpl(tail((A,(N,()))), tail((A,(M,())))) _ ==> GT_ _ _ Universality _ _ It's not hard to envision simulating a simple computer, except _ for interactive input-output--a deficiency that is shared with _ Turing machines. Thus, aside from resource limitations, bare _ m4 is Turing complete. _ _ Here's a workable plan: _ _ The program counter is maintained by `incr', and converted _ to string form by `base2'. _ _ Word addresses are binary mumbers. _ _ The contents of word n are held in a macro named W, where _ is the base-2 form of the address. _ _ Words need not be limited in size. Numbers may be kept _ in the form (sign,magnitude). (Caveat: the sign can't _ be `+' or `-', because programs can only switch on _ alphanumerics.) _ _ Instructions may be of variable length--one word of opcode, _ and following words that contain immediate operands or _ addresses expressed in binary form. _ _ Instruction codes are a limited set of binary numbers , _ and may be switched on by converting them to base-2 form I. _ _ An assignment redefines the designated word. _ _ Other instructions are programmed in styles illustrated above. _ _ Each instruction updates the location counter appropriately. _ _ A control macro fetches an opcode according to the location _ counter. If the opcode is a halt instruction, the control _ terminates. Otherwise it calls the designated operation, _ then calls the control recursively. _ _ M. Douglas McIlroy _ Dartmouth College _ August 2020 _ January 2021, comments tweaked _ September 2024, replace eq to work on arbitrary atoms _ June 2025, add Comparison functions and Appendix, _ revised switch declarations and description thereof _ _ Appendix _ _ This appendix rounds out the usual complement of _ arithmetic operators on natural numbers. No new _ programming techniques are introduced. _ _ Square. _ define(`square', `mul($1,$1)')_ _ _ Exponentiation. pow(a,n) returns a^n. If n=b+2*n', where _ b is 0 or 1, pow(a,n) = a^b*pow(a^n')^2. _ define(`pow',_ __`if(iszero($2), `one',_ __`if(eq(head($2),0), `square(pow($1,tail($2)))',_ __`mul($1,square(pow($1,tail($2))))')')')_ _ _ Subtraction. Because of the lack of negative numbers, _ we provide a "diminish" function, `dim', that clamps _ negative results at zero. _ _ dim(a,b) = max(a-b,0) _ _ The definition of `_dim' exploits the fact that $1>$2 _ when it is called from `dim'. Thus at every recursive _ level $1>=$2. The last line of `_dim', where $1=(0,..) _ and $2=(1,...), implements traditonal "borrowing" from _ the tail of $1 by adding to the tail of $2. _ define(`dim', `if(le($1,$2),`()',`trim(_dim($1,$2))')')_ _ define(`_dim',_ __`if(iszero($2), `$1',_ __`if(eq(head($1),head($2)), `(0,_dim(tail($1),tail($2)))',_ __`if(eq(head($1),1), `(1,_dim(tail($1),tail($2)))'_ __,`(1,_dim(tail($1),succ(tail($2))))')')')')_ _ _ Division. The customary integer operators `div' and `mod'_ _ are defined in terms of a quotient-remainder operator, `qr'. _ define(`div',`head(qr($1,$2))')_ define(`mod',`tail(qr($1,$2))')_ _ _ qr(n,d) yields a pair (q,r), the integer quotient and _ remainder for the division operation n/d, where d>0. _ The result is expressed in "strict (q-r) form". _ _ For expository purposes, we write n in q-r form as _ [q,r], where n = q*d+r. When r