|
File Projects.html
Author McKeeman
Copyright © 2007
index
Projects
If you cannot solve the proposed problem
try to solve first some related problem.
Could you imagine a more accessible related problem?
A more general problem?
A more special problem?
An analogous problem?
George Polya, How to Solve It (1945)
The compiler project is to extend X in some way that you find useful.
This does not, however, have to be done all in one step.
One can, to paraphrase Polya, "try a simpler similar problem first."
Extending xcom carries some expectations,
such as the ability to do input/output for all types of values.
The result of the compiler project is
- A live demonstration.
- A report
signed by all team members.
The oft-repeated and fundamental beginner's error is trying
to build the next great language as a learning project.
Do not try to solve the problems of the world yet.
You will have enough of your own.
My advice is to read this page to the end before making any decisions,
but don't let what you read limit your creativity.
You should first write a number of programs in your proposed extension.
This provides both a set of tests and
a confirmation that you will actually find the extension useful.
You should then extend the X
reference manual
to document your language.
If you do not want to grapple with LaTeX,
you can publish an HTML appendix instead.
Then you are ready to
extend xcom.
- Warm up exercise.
Add the absolute value function to X.
You should follow each of the recommended steps to make a
little change in X.
This warm up exercise avoids any language design,
and any serious pitfalls.
It will be hard enough without them.
- Change xcom
The recommended method
for changing xcom involves the order of events
and also the use of the unit tests.
When you get stuck, ask for help.
- Adding a scalar type (relatively easy)
Suppose you wanted to do more accurate scalar computations.
- Double Precision
You might prefer 64-bit IEEE to the current
xcom 32-bit IEEE implementation of real variables.
In fact, the hardware always does 64-bit arithmetic;
xcom goes to some trouble to truncate the results after
every operation.
Expanding the storage for scalars in the run-time frame
is the only substantive part of this extension.
Not much to be learned about compiler writing.
It is probably reasonable to extend the
integer precision to 64 bits
at the same time so as to have a consistent storage model.
- Loooong Integers
Some computations require many digits of integer accuracy.
One can extend xcom without changing the grammar at all
by simply changing the meaning of integer constants.
The (difficult) implementation of the arithmetic
can be left to the BIGNUM
package or the MathWorks
variable precision file exchange entry.
- Extended Accuracy
The BIGNUM
package offers much more than big integers.
One could implement big rational numbers or super-accurate reals.
- Complex
Replace real with
complex,
or add a new type complex.
You may want to reserve "i" for the square root of -1.
Do not push too much off onto the lexer. Expression
a+i*b
is already syntactically correct. You might want to add
builtin functions modulus and phase.
- People
One group decided to make a scalar type for people.
The language had to do with relationships.
Identifiers starting with a single uppercase letter such as John
were constants.
- Adding an operator
While one can invent arbitrary operators composed out of ASCII characters,
it is often best to overload an existing operator to keep X from getting
cluttered.
The addition of set union and string catenation are discussed below under
the corresponding data structures.
The warm up exercise is an example
where the operator is abs.
Consider
-
adding real-valued builtins inf and nan
and logical-valued builtins isinf and isnan.
-
adding ~~ && || to mean
short-circuit
logical evaluation
-
extending ~ & and | to type integer
(giving 32-bit results). Perhaps add ^ meaning exclusive or.
All of mathematical notation awaits your implementation.
- Adding a data structure
The X language has only scalar variables.
This makes it nearly impossible to write useful programs.
One is, then, highly motivated to add structured data variables.
By the way, python
has worked out many of the language design problems for
data structures.
It is OK to peek.
- Sets of scalar values.
One can imagine writing
X := {1,2,3};
y := choice(X); ` a randomly chosen member
Z := {i2r(y)} | {3.141592};
if ~(4.0 in Z) ? fi; ` assert 4.0 is NOT there
X := X - {y}; ` discard y from X
wherein finite sets of values are allowed.
In essence this means adding a few new symbols to X.cfg,
three new types
(logicalSetType, integerSetType, realSetType) to the symbol table,
and then carrying through the implementation.
A set computation will necessarily allocate new storage
for each result.
The storage needs to be accessible from your emitted code,
so it is more convenient to let C allocate and free the storage.
xcom contains a convenient mechanism for dropping into C
for such actions
(see the implementaton of /, // and rand in file getCfun.c).
You might consider how to avoid storage leaks.
You might also consider a set-complement operation.
Note the implied overloading of the '|' operator to mean set union above.
You may, of course, choose one of many other ASCII notational devices,
perhaps '||' or '|_|' or '\/' or 'U'.
The sets idea quickly leads to sets of anything, including sets.
The mathematics can get deep fast. Step carefully.
- Strings
One can imagine writing
X := "abc" + "def";
y := 3;
Y := X[y:y+1];
z := c2i(X[2]);
- Vectors of scalar values
One can imagine writing
y := 3;
X := [1,2,y];
Y[3] := X[y-1];
As in MATLAB, vectors expand storage to hold any value
sent their way. Accessing an undefined value causes a run error.
The representation of a vector in memory might be a simple
version of the MATLAB representation. In the variable is
an address, which points to a header, which points to the data.
Both the header and the data must be allocated. The data might
need to be reallocated as the computation proceeds.
len ptr data
+-----+----+ +-------+
x: o---->| 4 | o-----> |1 2 3 4|
+-----+----+ +-------+
- Lists of lists
One can imagine writing
y := 3;
X := <1,2,y>;
Y := <X, <7,9.0>> + <3.12>;
Z := tail(Y);
- Abstraction
Objects and structs are hierarchical data structures with named
fields. In C one might write
struct T {
int x;
float y;
} t;
to get a two-field struct. In MATLAB once writes
t.x = 0;
t.y = 1.0;
to get the same effect and, in addition, initialize the fields.
In an extension to X one might write
T = {x,y};
t = T{0,1.0};
where the first assignment makes T a two field struct type,
and the second initializes the fields and, as a side effect,
permanently binds the type of the fields of type T.
- Doing science
Not all computations are just grown-up FORTRAN.
Some science requires algebraic manipulations.
Some has data that is nothing like the computers give us.
Compiler writers are rarely competent in such science,
and such scientists are rarely competent in compiler writing.
There is an opportunity for a big advance here.
- Chemistry
A chemist writes formulas in which the operands are chemicals.
The equations enforce constraints and imply reactions.
The HTML presentation here does not do justice to this idea
but X can be typeset by xcom or even rendered by
MATLAB graphics.
water := 2H + O;
h := water.enthalpy;
methane := C + 2H2
caffeine.name = '3,7-Dihydro-1,3,7-trimethyl-1H-purine-2,6-dione'
caffeine.diagram
- Little Languages
Many problems are easier to solve with a
little language
. There are many successful examples.
Instead of modifying xcom, you start from scratch.
All things considered, it best to make this your second project.
- Localization
In this English-centric world of computing,
everyone is forced to learn the English words describing
things in X.
Perhaps a Latino would prefer si-is to if-fi
or entero to integer.
The idea is to change nothing in X except its input.
What would someone in 中國 want, for example?
The biggest piece of work is deciding what to do (language design).
The second task is to modify the lexer to accept
UTF8 instead of ASCII.
Then any other place where the source text is touched has to be
make "unicode safe".
- Final Word
If one is going to achieve Dijkstra's "compelling and deep
logical beauty," one must forgo much of the paraphernalia of
conventional general purpose programming languages.
This spartan attitude is most likely to succeed when the
objective of the language is rigorously limited,
the linguistic additions are few,
and additions are well suited to meeting that objective.
See some ideas on computer language
design to get a fresh perspective.
| |