Read: bonwick01.pdf. Compare to the previous SLAB paper: bonwick94.pdf.

=====[ Heap History: From Doug Lea's malloc() to Vmem ]=====

Once a virtual address space is subdivided into regions by intended
use (code, stack, heap, etc.), heaps must be managed on a more
granular level of _chunks_ or _objects_. 

Chunks used to be the only way memory allocation worked: when needed,
a program (or kernel) called malloc() to allocate a chunk of the size
given in argument, and called free() when ready to release that memory
chunk back into the heap. Care should be avoid memory fragmentation:
imagine that you fill a 2M heap with lots of 16-byte structs in a row,
then release every second one such struct. Even though you'll have 1M
free worth of bytes, you won't be able to allocate any structure larger
than 16 bytes unless you have a way of moving around the structs still
in use without breaking pointers pointing to them. 

This is still the best one can do when we don't know how much memory
will be needed next. However, an OS knows that it will need a bunch of
proc_t, vnode_t, kthread_t, and other structures of known sizes. So
instead of mixing chunks of these known and frequently used sizes in a
single heap, we can allocate them in dedicated heaps, and know that
the next struct of a given size will always fit if free slots are
available (and if not, we'll grab another dedicated page). This
method of allocation is called _slab_ allocation (the slab is the page
where structs of the same size and layout lie back to back). 

In OpenSolaris/Illumos and Linux the allocator that handles slabs
(also called "object caches") is called the KMEM allocator.  This
method can also apply to any application that creates many instances
of a particular struct.

We will first review the history of heaps, then examine the VMEM
allocator that improved on KMEM.

============= Legacy BSD kernel memory allocator =============

Before we start on OpenSolaris' Vmem allocator, it will be instructive
to look at the legacy BSD kernel memory allocator. It is a simple 
"first-fit", and it worked well enough for single-processor machines.

In this code, the sequence of "struct map"s is traversed by
incrementing bp acts as a free list in which the first chunk of size
greater or equal than requested is found:

/*
 * Allocate 'size' units from the given
 * map.  Return the base of the allocated space.
 * In a map, the addresses are increasing and the list is terminated by a 0 size.
 * The core map unit is 64 bytes; the swap map unit is 512 bytes.
 * Algorithm is first-fit.
 */
malloc(mp, size)
struct map *mp;
{
        register unsigned int a;
        register struct map *bp;

        for(bp=mp;bp->m_size && ((bp-mp) < MAPSIZ);bp++) {
                if (bp->m_size >= size) {
                        a = bp->m_addr;
                        bp->m_addr += size;
                        if ((bp->m_size -= size) == 0) {
                                do {
                                        bp++;
                                        (bp-1)->m_addr = bp->m_addr;
                                } while ((bp-1)->m_size = bp->m_size);
                        }
                        return(a);
                }
        }
        return(0);
}

(for the corresponding free() code, see lions-book-malloc.txt) 

Note that the "struct map" pointed by bp can be allocated both
"in-band" (right next to the memory chunks being managed) or
"out-of-band", in a separate memory area.  OpenSolaris chooses to
allocate similar structures "out-of-band", as explained in 11.3.4.1

This legacy BSD "malloc" code has been made famous by the SCO case
(in which SCO laid a claim to "intellectual property" in the Linux kernel):
      http://www.lemis.com/grog/SCO/code-comparison.html

============= OpenSolaris kernel's KMEM allocator =============

The KMEM allocator improved on the legacy heaps similar to the above
on several fronts. It removed the need for much of the chunk
operations by grouping objects of the same _size_ into dedicated
"slab" pages; it saved on the initialization of objects by introducing
the idea that a freed object nevertheless remains partially
constructed and can be reused without reverting to a raw chunk that'd
need to be reinitialized from scratch; and that objects can be
allocated in a "fast path" on a per-CPU basis, without the allocating
CPU having to take a global lock such as that for a freelist or the
list of boundary tags every time (thus locking out all other CPUs from
allocating objects of that kind until it's done).

Read bonwick01.pdf for more details, and also also look at
percpu-allocations.txt for the details of how per-CPU allocation is
handled.

On your Illumos VM, you can list all KMEM caches with the ::kmem_cache
DCMD. Look through that list for the kinds of objects that are allocated,
then pick through a cache for a familiar structure. I suggest "process_cache",
where proc_t structs come from (see also its definition and uses in
http://src.illumos.org/source/search?refs=process_cache&path=%2Futs%2F&project=illumos-gate )

As a reminder, whenever you forget the exact DMCD names, grepping helps:
e.g., "::dcmds ! grep kmem ".

Suggestion: follow kmem_cache_alloc
http://src.illumos.org/source/xref/illumos-gate/usr/src/uts/common/os/kmem.c#2503
through the fast path (in which the magazine per-CPU layer delivers
pointers to cached/constructed objects; this fast path is enclosed into an
endless for-loop; note the continue vs break) to the point where finding
an already-constructed cached object fails and  kmem_slab_alloc()
is called to get a raw buffer:

2591   /*
2592    * We couldn't allocate a constructed object from the magazine layer,
2593    * so get a raw buffer from the slab layer and apply its constructor.
2594    */
2595    buf = kmem_slab_alloc(cp, kmflag);

See where the constructor is called on this buffer to make it into an object!

Then pick through the raw-buffer-maker kmem_slab_alloc()
(http://src.illumos.org/source/xref/illumos-gate/usr/src/uts/common/os/kmem.c#kmem_slab_alloc)
to see where the raw buffer "buf" comes from. This will bring you to kmem_slab_alloc_impl(),
and the KMEM_BUF macro. This will give you an idea how data is kept within a slab.

============= OpenSolaris kernel's VMEM allocator =============

By contrast, OpenSolaris interfaces allow multiple named pools of
memory with uniform properties per pool ("Slabs" aka "Kmem caches";
Vmem "arenas"). Essentially, a pool becomes a named object, in which
allocations and deallocation functions become methods. Pools can be
nested and configured to obtain new allocations from an enclosing pool
object when necessary. 

The textbook stresses the generalized character of the VMEM allocator in
Ch. 11.3, pp. 552--553. As described, VMEM allocates subranges 
of integers of requested size within the initial range allocated
at system boot. The integers are primarily meant to be address
ranges (in particular, nested), but can also be integer ID ranges.

Read the bonwick01.pdf and bonwick94.pdf papers in the class directory
(they overlap with the textbook text, but make some points better).

This is stressed by calling the allocated ranges "resources", not
"addresses". Although the allocator includes some special
functions that are address-aware (vmem_xalloc, in particular,
controls address range "coloring" as in 10.2.7), they try to
be as forgetful about the nature of the ranges as possible, and
treat the allocation as a general algorithmic problem about
allocating integer intervals economically.

The initial range is ultimately derived either from the static
per-platform kernel memory layout as in
http://src.illumos.org/source/xref/illumos-gate/usr/src/uts/i86pc/os/startup.c#383
or from a fixed permissible range of IDs.

Page 554 summarizes the VMEM interface, explained in pp. 555-560.
Read it before we start looking at the actual Vmem code.

To see how frequently & broadly this mechanism is used, search
Illumos for vmem_init(), vmem_create(), and kmem_cache_create().
Note the hierarchical structure of the Vmem and Kmem objects
being created; check that it corresponds to what you see with
the  ::vmem  MDB command. 

Core kernel memory allocation happens in http://src.illumos.org/source/xref/illumos-gate/usr/src/uts/common/vm/seg_kmem.c . Read the top comment about the seg_kmem driver and the parts
of the kernel heap, and follow the creation of the vmem_t objects declared
at lines 103--122 inside kernelheap_init().

Also note the methods table (seg_ops) for this driver being declared &
defined in seg_kmem.c at line 776. These methods working in concert
with each other _are_ the driver. Note how  seg->s_data  is treated
thoughout these methods! (If you wonder what  "kvp" is and why 
you cannot find this symbol in MDB, look at 
http://src.illumos.org/source/xref/illumos-gate/usr/src/uts/common/vm/seg_kmem.h#70
and use the actual symbol "kvps" instead).

There are some vmem arenas created also in
http://src.illumos.org/source/xref/illumos-gate/usr/src/uts/i86pc/os/startup.c
(look for vmem_create()).

To see the full list of VMEM arenas and their parentage, use the DCMD  "::vmem".

VMEM arenas use a cure trick for allocating smaller size chunks up to
a certain size of the VMEM's quantum (explained in the VMEM API; the unit of allocation
by which VMEM increments its integers): namely, they fall back of KMEM caches
specially created for chunks of these sizes! To wit:

http://src.illumos.org/source/xref/illumos-gate/usr/src/uts/common/os/vmem.c#1262

1262void *
1263vmem_alloc(vmem_t *vmp, size_t size, int vmflag)
1264{
1265	vmem_seg_t *vsp;
1266	uintptr_t addr;
1267	int hb;
1268	int flist = 0;
1269	uint32_t mtbf;
1270
1271	if (size - 1 < vmp->vm_qcache_max)
1272	   return (kmem_cache_alloc(vmp->vm_qcache[(size - 1) >>
1273	   	      vmp->vm_qshift], vmflag & VM_KMFLAGS));

The vmem arena is created with an array of KMEM quantum caches for the
different sizes (up to a vm_qcache_max), and all smaller-sized allocs
simply go to these KMEM caches' kmem_cache_alloc.

Suggestion: locate these KMEM quantum caches for different VMEM arenas
using the ::kmem_cache DCMD. Hint: look for strings in names used with
kmem_cache_create() and vmem_create() via OpenGrok's "Full Search".

Note that KME caches themselves use VMEM to grab new slab pages, so
these two allocators work closely together, each at what they do best!

========[ Bonus: Heap exploitation ]========

The two classic Phrack papers that exploited malloc's in-band boundary tags:

Phrack 57:9 "Once upon a free()":
http://www.phrack.com/issues.html?issue=57&id=9

Phrack 57:8 "Vudo malloc tricks":
http://www.phrack.com/issues.html?issue=57&id=8

A summary of these and other exploitation techniques:
Phrack 61:6 "Advanced Doug lea's malloc exploits"
http://www.phrack.com/issues.html?issue=61&id=6

(the above referring to http://g.oswego.edu/dl/html/malloc.html)

--- Recent Advances ---

"Heap Feng-shui" by Alex Sotirov:
http://www.blackhat.com/presentations/bh-europe-07/Sotirov/Presentation/bh-eu-07-sotirov-apr19.pdf

(Explains how the memory allocator is an environment to program with
a series of memory allocation requests, to shape the heap's chunks
to configurations amenable to attack). 

Phrack 68:10 "Pseudomonarchia jemallocum/The false kingdom of
jemalloc, or On exploiting the jemalloc memory manager", argp & huku:
http://www.phrack.org/issues.html?issue=68&id=10

(Features exploitation of external, "out-of-band" boundary tags!).

==================[ Bonus: memory allocation in Linux ]====================

A discussion of the Glib memory allocator, which is thread-aware
and goes a way beyond DL-malloc:

   https://sploitfun.wordpress.com/2015/02/10/understanding-glibc-malloc/

The 2.6 Linux generic kernel memory allocator API is described here:

                  http://www.linuxjournal.com/article/6930

Note that kmalloc() is a function shared by all of the kernel's non-slab
allocations (slabs are handled differently and are closer to the OpenSolaris
KMEM allocator (Ch. 11.2) without the extra "magazine" and "depot" layers.

Flags from Table 4 in the above link determine whether this particular allocation
can or cannot block, and also distinguish between several purposes of allocated
memory.