[previous lecture] [next lecture]

COSC48 Implementation of Programming Languages

Lecture 17

Storage Management

Review

  • optimization

Storage layout

In the old days...

        real memory
0x0 +-------------------+
    | reserved OS space |
    +-------------------+
    |    user code      | placed at load time
    +-------------------+
    |      heap         |
    |       |           |
    |.......v...........| <-- sbreak
    |                   |
    |     avail         | out-of-memory when they collide
    |...................|
    |       ^           |
    |       |           |
    |     stack         |
max +-------------------+
Multiprocess view...
      real memory       page table        user 1        swap
0x0 +---+---+---+---+   virt|real/        virtual       disk
    |   |   |   |   |   addr|disk         memory
    +---+---+---+---+   +---+---+     0x0 +----+   0x0 +----+
    |   |   |   |   |   +---+---+         |    |       |    |
    +---+---+---+---+   +---+---+         |    |       |    |
    |   |   |   |   |   +---+---+         |    |       |    |
    +---+---+---+---+   +---+---+         |    |       |    |
    |   |   |   |   |   +---+---+    2^64 +----+       |    |
max +---+---+---+---+   +---+---+                      |    |
                        +---+---+                      |    |
                                                  tera |    |
                                                  bytes+----+
More modern view...
   level 1   level 2   level 3   real
   cache     cache     cache     memory  ...
   +-+-+     +-----+   +-----+   page
   +-+-+     +-----+   |     |
             +-----+   |     |
             +-----+   +-----+
                       |     |
                       |     |
                       +-----+

Heap Management

Typical heap element
   +------+-----------+
   |secret|  malloc-d +
   |header|  by user  +
   +------+-----------+

Free space is linked together (minimum size dictated by need for link). free(p) returns element to free list, merging if possible.

malloc(n) splits free element, part to RW list, part back to free list. Free list needs to be sorted by address to allow merging. Free list needs to be sorted by size to implement best match. A balanced tree of free nodes can be used instead of a list for finding free candidate.

Hard problems:

  • make storage management fast
  • which free element to split
  • avoid fragmentation
  • support threads
For every solution there is a demon program that will run inefficiently.

Interesting experiment -- reverse engineer the secret header:

  • find real allocation size for malloc(n) as a function of n by printing the difference of addresses of successive malloc-s
  • print the contents of the "gap" between user-visible malloc-d storage
  • repeat the above after calling free
  • find the free list

Garbage Collection

In addition to the OS-level complexities above, using malloc() and free is error prone at the user level.

malloc/free roblems:

  • storage leaks
  • dangling pointers

Starting with LISP (early 60's), garbage collection has been provided to avoid the problems. Because LISP was uniform and simple, gc was relatively simple. Times changed. By 1965 the XPL system had a generational compactifying garbage collector for strings.

Garbage collection is based on the fact that variables reside in known places (stack, static, ...). Some variable may contain pointers to allocated storage. If the layout of that storage is known, and itself contains pointers to allocated storage with known layout, then all in use elements can be found by following pointers starting from the named program variables. Furthermore there must be a way to find all allocated storage, in use or not.

So the garbage collection trick is to:

  1. pause the user program
  2. mark all allocated elements as AVAIL
  3. mark all the useful elements as INUSE (clobbering some of the AVAIL markers)
  4. return all elements still marked AVAIL to the free list
  5. unpause the user program

Hard problems:

  • knowing storage layout
  • make gc fast
  • make gc unobtrusive

For every solution there is a demon program that will run inefficiently.

Cheap Solution -- Conservative Collectors

Even if the system does not know the storage layout (as in uses of malloc), a conservative kind of gc can be used. Assume pointers are constrained to be in aligned storage. Examine all variables. Only some of them will have bit-patterns that could be heap addresses (within the heap range, trailing 00 bits). Examine all aligned values in the heap elements pointed to by the bit-patterns. Continue recursively. All in-use store will be reached. By chance some AVAIL store may be missed, and therefore can be collected. This has been implemented with some success for C and C++. It can be used with free.

Compacting Collectors

If pointers cannot be secretly held and later revealed, their values are available to be changed by the garbage collector. All in use elements can be moved to the lowest addresses. This solves the fragmentation problem. Of course, in C, this won't work. But it does in Java. After collection all INUSE memory is at the bottom of the heap and all free memory is in one block.

Generational Collectors

One hypothesis is that anything that survived one collection is likely to survive the next. That is, the programmer is probably using it for something of long duration. The garbage collector may then concentrate on recently created elements with the expectation of lower costs for the expected return in collected storage. Eventually storage runs out so a major collection is initiated, starting the process over. The compacting and generational ideas work together well since compacting collects all the elements of a generation together. As it turns out, collecting only the oldest also works. It depends on individual program behavior.

Partitioned Collectors

One can also separate allocated storage into different kinds of data. For instance, all items of one type can be put in a subheap which can be collected independently of other subheaps. A particularly effective Java trick is to isolate strings from the rest of the stuff. They are roots. There is a simple collecting algorithm. The size of the space is smaller and therefore faster to collect. Furthermore, reference counting can be used strings never point to anything else.

Compiler Help -- Precise Garbage Collection

One does not need to guess at storage layout if the compiler symbol table is still around. Given a root address, find the symbol table entry. The entry provides the layout of the root data structure, including the types of pointers in it. Follow the pointers. The type of the destination is known, therefore the layout. Continue recursively.

Only the layout (location of pointers) is necessary for precise garbage collection. Suppose that at "new" time, a bit pattern indicating which words contain addresses were allocated in the "secret" malloc structures. Walking the INUSE data would then be exact and efficient.

Adaptive Collectors

One can observe the regular collector for awhile, detect a pattern of usage, generate a special collector for that pattern, and share the collection between two (or more collectors).

Real-time Constraints

If you are watching a dynamic graphic display, or collecting particle data from an accelerator, or implementing an autopilot, you may decided that the program cannot afford to pause long enough to collect garbage.

Parallel Collectors

One might try to implement garbage collection on a parallel thread (open heart surgery while the patient is at work), or an incremental collector that must do what it can in a fixed budget of time (a few milliseconds) or a combination. It can be done.

Parallel garbage collectors are inherently inefficient. The reason is that parallel threads increase the randomness of the load on the cache and virtual memory. Much time is wasted in cache misses and overlays.

Incremental Collectors

One can also devise incremental collectors that are guaranteed to run in the alloted maximum pause time and guaranteed to collect at least some garbage. Such results are publishable.

Debugging

The worst problem in real-time work is debugging. The debugger itself makes demands on the system, changes timing, produces garbage. The best solution I have found is a user program that

  • randomly creates threads (lots of them)
  • randomly creates data structures (all strung together)
  • destroys the above randomly
  • explicitly calls the garbage collector at random times.

Such a program usually causes hangs, segment errors, and various other obvious faults. One of mine had a MTBF of 3 seconds when first tried on a production Java system. After several man-years of effort, chasing one bug at a time, it had a MTBF of 3 days. Good enough? What would you expect if a new garbage collector were rolled in? Or several copies of the program were run at once? Or the hardware got upgraded?


[previous lecture] [next lecture]

Created: Thursday, April 25, 2001
Last modified: Wed May 23 21:30 EDT 2001