File Glossary.html    Author McKeeman    Copyright © 2007    index

The xcom Glossary

Technical Terms

  • arena -- allocating things.

    Memory allocation can be retail (C malloc) or wholesale (suballocation out of a large pre-allocated arena). Arena allocation is always an attractive alternative because the entire arena can be freed as a unit when it is no longer needed. In MATLAB where there are no pointers and no malloc, arena allocation is especially attractive. An index into the arena locates data. The data can contain other indices, providing the same kind of data structures available with C pointers. Increasingly relevant, the size of the index type need not be as big as the computer address space, which can lead to a considerable savings in the space needed to store the links. Because all links are arena-relative, nothing is lost if the arena must be enlarged (C realloc).

    The syntax trees of xcom are arena allocated.

  • big bang -- bad planning.

    The big bang in software results when a big change has to be made all in one step. Nothing works until it all works. There are many disadvantages to such a plan. The most worrisome to management is that there is no way to tell if something has gone wrong until much time and effort has been spent. A big bang can lead to a big slip in delivery, or project cancellation. Almost everyone in the business knows of one example. One alternative is incremental change -- do a little; test a lot.

    Another alternative is to prototype. A series of ever more ambitious prototypes, each of which demonstrates more of the final objective, and perhaps each done to higher software engineering standards, provides milestones and the opportunity to learn something. The danger in prototyping is that a less-than-final stage may get turned into a product.

    If the software is modular, each module can become a subproject. Unit tests can be implemented for each module, supplying useful milestones.

  • bind -- connecting things.

    Programming languages and mathematical notation share a concept of operator finding in expressions. The words precedence, priority, and/or hierarchy are also often used to describe this concept. Readability depends on the reader of an expression understanding what are the operands of each operator. The association of operands and operators can always be made explicit with parentheses but readability suffers. Saying that * binds more tightly than + (which is true in X) allows the reader to immediately infer the equivalence of the following examples.

    1+2*3+4<-->1+(2*3)+4
    1*2+3*4<-->(1*2)+(3*4)

  • cache -- faster memory.

    Any particular computer may have several kinds of cache memory. Cache memory is faster than main memory. The idea is that a copy of parts of main memory is kept in the cache and therefore can be accessed faster than it could be directly from main memory. A cache miss requires that many words of data be loaded from main memory into the cache before the needed word of memory can be used. If main memory is randomly accessed, a cache will not help much.

    Performance optimization often depends more on managing the cache than it does on instruction selection.

  • compile time.

    Compile time has two meanings.

    1. Compile time can mean the time elapsed during the production of an executable form. This is the metric that JITs try to minimize since it is directly related to perceived performance. For example, "The compile time is 10-6 seconds per line of source code"

    2. Compile time can mean a phase of the total process of getting code into execution. For example, "Symbol table lookup occurs at compile time." Other analogous phases might be edit time, link time, load time or run time.

  • fixup -- late insertion of a value.

    The order of events in a compiler may force partly-done code to be produced. A fixup is a process which, at a later time, finishes the job by placing the missing values into the partly-done code. The typical situation requiring a fixup is a forward conditional branch.

  • hash -- randomized access for fast lookup.

    The simplest way to search a large table of strings is from one end to the other.

    A faster way is to compute a small repeatable random number based on the the characters in each string in the table. Assuming there are N entries in the table and the small random number is 0 <= H < M, then items with the same hash H can be linked together. When it is time to look something up, the hash H is computed and only the approximately N/M entries on the corresponding list need to be examined.

    A typical value for M is 256. The lookups for a large table are thereby speeded up by a factor of 256.

  • opaque -- a value that can be assigned but not examined.

    A separation of concerns can be established if one part of a program computes values and another simply holds the values to pass back. The value manager does not need the capability of examining the values it manages.

  • pass -- one of a series of processes on a single set of data.

    When memories were small, all compilers had to make multiple passes over the source input, each pass collecting and tabulating some particular kind of information, perhaps writing the data to secondary store to be read in again later. The number of passes approached 100 in the most elaborate cases.

    Now it is rare to look at the source input more than once, but it is still common to repeatedly process some intermediate form, and even more common to organize the compiler to transform an intermediate form for a complete compilation unit before starting on the next process.

    In xcom, the symbol table is implemented as multiple passes to enable the propagation of type information across the control flow of the user program.

  • PFN -- Parenthesis-free Notation.

    PFN is a notation invented by Polish mathematician Jan Łukasiewicz. The original notation was "operator-early" or "prefix", that is 1+2 was represented by +12. The form used as executable machine code for expressions on stack-based computers is "operator-late" or "postfix", that is 1+2 is represented by 12+. Sometimes it is called "Reverse Polish", but that name is not popular in Poland.

    Here is what the author himself had to say about the notation: "I came upon the idea of a parenthesis-free notation in 1924. I used that notation for the first time in my article Lukasiewicz(1), p. 610, footnote. See also Lukasiewicz (2) pp. 7 and 38, and Kotarbinski, p. 244."

    In fact, Lukasiewicz(1) mentioned above is a lithographed report in Polish.

  • plug compatible -- interchangeable software units.

    In electronics, if two things need to be connected, then they must have matching connectors (called a plug and socket). For example, a USB port is a kind of socket and obviously can only be used with the corresponding USB plug.

    In software, two modules are "plug compatible" if they can be swapped. This requires that they have the same software interface, as well as doing the same job.

  • pun -- a construct that pretends its bits are of another type.

    There are three puns in the xcom implementation.

    The first two are related: f2i and i2f. The runtime frame has M type int32. When the variable is an X integer, it can be accessed normally from MATLAB. When the variable is an X real, MATLAB must be fooled into treating the value as MATLAB single.

         x=i2f(frame(i))
      

    makes x have MATLAB type single but keeps the exact bits that are in the frame at index i.

         frame(i)=f2i(x)
      

    puts the bits back. In this way a MATLAB vector can hold a mixture of types as required for the runtime frame.

    The third pun fools the C/C++ compiler into treating a data pointer as though it were a pointer to a function. This trick is used to do the "go" step in a load&go compiler. It is found in Asmx86.m.

  • pure -- no side effects.

    A function is pure if it has no effect except for the value(s) returned.

  • RAM -- Random Access Memory.

    The term RAM was in common use when programs and data resided in a large memory which had uniform access speed. No modern computer has such a thing anymore; all memory has non-uniform access times. Main Memory is a better term. Main memory is accessed via cache memory.

  • short circuit -- early exit for ~, &, and |.

    The logical operators follow a simple algebra. There are two common ways to compute their results. The most obvious is ordinary expression evaluation using the hardware arithmetic. Since there are a lot of bits in a typical hardware word, this kind of logical evaluation is intrinsically parallel.

    The less obvious way is early-exit -- evaluating only as much of the logical expression as is necessary to determine the result. One expects early-exit to be faster. The important side effect of the second trick is that undefined values that are never reached do not cause trouble.

  • side effect -- unobvious change.

    The main purpose of a piece of code (e.g. rand delivers a number) may be accompanied by another purpose (e.g. updating the seed). Intended or otherwise, side effects pose problems for maintenance.

  • wart -- ill-considered program design.

    As software ages, changes are made. It seems as if the new contributor does not have much of an idea about the gestalt of the orginal author. Since there are many places to "hang" a fix and pass the test suite, the "where" gets scant consideration. A wart is such a piece of code, ill-placed or ill-written. The long-term effect of warts is to cause a large rewrite.