File Projects.html    Author McKeeman    Copyright © 2007    index

Projects

If you cannot solve the proposed problem
try to solve first some related problem.
Could you imagine a more accessible related problem?
A more general problem?
A more special problem?
An analogous problem?
George Polya, How to Solve It (1945)

The compiler project is to extend X in some way that you find useful. This does not, however, have to be done all in one step. One can, to paraphrase Polya, "try a simpler similar problem first." Extending xcom carries some expectations, such as the ability to do input/output for all types of values.

The result of the compiler project is

  1. A live demonstration.
  2. A report signed by all team members.

The oft-repeated and fundamental beginner's error is trying to build the next great language as a learning project. Do not try to solve the problems of the world yet. You will have enough of your own.

My advice is to read this page to the end before making any decisions, but don't let what you read limit your creativity.

You should first write a number of programs in your proposed extension. This provides both a set of tests and a confirmation that you will actually find the extension useful. You should then extend the X reference manual to document your language. If you do not want to grapple with LaTeX, you can publish an HTML appendix instead. Then you are ready to extend xcom.

  • Warm up exercise.

    Add the absolute value function to X. You should follow each of the recommended steps to make a little change in X. This warm up exercise avoids any language design, and any serious pitfalls. It will be hard enough without them.

  • Change xcom

    The recommended method for changing xcom involves the order of events and also the use of the unit tests. When you get stuck, ask for help.

  • Adding a scalar type (relatively easy)

    Suppose you wanted to do more accurate scalar computations.

    • Double Precision

      You might prefer 64-bit IEEE to the current xcom 32-bit IEEE implementation of real variables. In fact, the hardware always does 64-bit arithmetic; xcom goes to some trouble to truncate the results after every operation. Expanding the storage for scalars in the run-time frame is the only substantive part of this extension. Not much to be learned about compiler writing. It is probably reasonable to extend the integer precision to 64 bits at the same time so as to have a consistent storage model.

    • Loooong Integers

      Some computations require many digits of integer accuracy. One can extend xcom without changing the grammar at all by simply changing the meaning of integer constants. The (difficult) implementation of the arithmetic can be left to the BIGNUM package or the MathWorks variable precision file exchange entry.

    • Extended Accuracy

      The BIGNUM package offers much more than big integers. One could implement big rational numbers or super-accurate reals.

    • Complex

      Replace real with complex, or add a new type complex. You may want to reserve "i" for the square root of -1. Do not push too much off onto the lexer. Expression

            a+i*b

      is already syntactically correct. You might want to add builtin functions modulus and phase.

    • People

      One group decided to make a scalar type for people. The language had to do with relationships. Identifiers starting with a single uppercase letter such as John were constants.

  • Adding an operator

    While one can invent arbitrary operators composed out of ASCII characters, it is often best to overload an existing operator to keep X from getting cluttered. The addition of set union and string catenation are discussed below under the corresponding data structures. The warm up exercise is an example where the operator is abs.

    Consider

    • adding real-valued builtins inf and nan and logical-valued builtins isinf and isnan.
    • adding   ~~   &&   ||   to mean short-circuit logical evaluation
    • extending   ~   & and   |   to type integer (giving 32-bit results). Perhaps add ^ meaning exclusive or.

    All of mathematical notation awaits your implementation.

  • Adding a data structure

    The X language has only scalar variables. This makes it nearly impossible to write useful programs. One is, then, highly motivated to add structured data variables. By the way, python has worked out many of the language design problems for data structures. It is OK to peek.

    • Sets of scalar values.

      One can imagine writing

          X := {1,2,3};
          y := choice(X);       ` a randomly chosen member
          Z := {i2r(y)} | {3.141592};
          if ~(4.0 in Z) ? fi;  ` assert 4.0 is NOT there
          X := X - {y};         ` discard y from X
              

      wherein finite sets of values are allowed. In essence this means adding a few new symbols to X.cfg, three new types (logicalSetType, integerSetType, realSetType) to the symbol table, and then carrying through the implementation.

      A set computation will necessarily allocate new storage for each result. The storage needs to be accessible from your emitted code, so it is more convenient to let C allocate and free the storage. xcom contains a convenient mechanism for dropping into C for such actions (see the implementaton of /, // and rand in file getCfun.c). You might consider how to avoid storage leaks. You might also consider a set-complement operation.

      Note the implied overloading of the '|' operator to mean set union above. You may, of course, choose one of many other ASCII notational devices, perhaps '||' or '|_|' or '\/' or 'U'.

      The sets idea quickly leads to sets of anything, including sets. The mathematics can get deep fast. Step carefully.

    • Strings

      One can imagine writing

              X := "abc" + "def";
              y := 3;
              Y := X[y:y+1];
              z := c2i(X[2]);

    • Vectors of scalar values

      One can imagine writing

              y := 3;
              X := [1,2,y];
              Y[3] := X[y-1];

      As in MATLAB, vectors expand storage to hold any value sent their way. Accessing an undefined value causes a run error. The representation of a vector in memory might be a simple version of the MATLAB representation. In the variable is an address, which points to a header, which points to the data. Both the header and the data must be allocated. The data might need to be reallocated as the computation proceeds.

                         len   ptr        data
                       +-----+----+     +-------+     
              x: o---->|  4  |  o-----> |1 2 3 4|
                       +-----+----+     +-------+
            
    • Lists of lists

      One can imagine writing

              y := 3;
              X := <1,2,y>;
              Y := <X, <7,9.0>> + <3.12>;
              Z := tail(Y);

    • Abstraction

      Objects and structs are hierarchical data structures with named fields. In C one might write

              struct T {
              int x;
              float y;
              } t;
            

      to get a two-field struct. In MATLAB once writes

              t.x = 0; 
              t.y = 1.0;
            

      to get the same effect and, in addition, initialize the fields.

      In an extension to X one might write

              T = {x,y};
              t = T{0,1.0};
            

      where the first assignment makes T a two field struct type, and the second initializes the fields and, as a side effect, permanently binds the type of the fields of type T.

  • Doing science

    Not all computations are just grown-up FORTRAN. Some science requires algebraic manipulations. Some has data that is nothing like the computers give us. Compiler writers are rarely competent in such science, and such scientists are rarely competent in compiler writing. There is an opportunity for a big advance here.

    • Chemistry

      A chemist writes formulas in which the operands are chemicals. The equations enforce constraints and imply reactions. The HTML presentation here does not do justice to this idea but X can be typeset by xcom or even rendered by MATLAB graphics.

      water := 2H + O;
      h := water.enthalpy;

      methane := C + 2H2

      caffeine.name = '3,7-Dihydro-1,3,7-trimethyl-1H-purine-2,6-dione'
      caffeine.diagram
      image

  • Little Languages

    Many problems are easier to solve with a little language . There are many successful examples. Instead of modifying xcom, you start from scratch. All things considered, it best to make this your second project.

  • Localization

    In this English-centric world of computing, everyone is forced to learn the English words describing things in X. Perhaps a Latino would prefer si-is to if-fi or entero to integer.

    image The idea is to change nothing in X except its input. What would someone in 中國 want, for example?

    The biggest piece of work is deciding what to do (language design). The second task is to modify the lexer to accept UTF8 instead of ASCII. Then any other place where the source text is touched has to be make "unicode safe".

  • Final Word

    If one is going to achieve Dijkstra's "compelling and deep logical beauty," one must forgo much of the paraphernalia of conventional general purpose programming languages. This spartan attitude is most likely to succeed when the objective of the language is rigorously limited, the linguistic additions are few, and additions are well suited to meeting that objective. See some ideas on computer language design to get a fresh perspective.