========== Illumos Virtual Memory Design Overview ========== The textbook covers different aspects of the VM system in Chapters 9--12. There are two ways to look at memory: "top-down", from the proc_t's struct as, which contains the information on the mapped ("valid") address ranges in the process' virtual address space towards the virtual and then, through the hardware address translation, to the physical pages; and "upward", from the pages towards the files or "anonymously" allocated memory structures these pages are used to store. Top-down: 1. From a virtual address space as seen by a process or the kernel context (cf. "kas", the kernel's global "struct as" for the kernel's virtual address space, as in Fig. 11.2 p. 533): proc_t.p_as -> "struct as" -> an AVL tree of "struct seg" "struct seg"s are managed by "segment drivers", of which the seg_vn driver does the work of mmap-ing file and anonymous memory allocations. These segment drivers operate on driver-specific "s_data" members. Seg_vn's s_data format is "struct segvn_data". Observe that the "struct seg" contains both pointers to driver-specific data "s_data" *and* driver-specific functions that operate on this data (and thus know its format, and do the right thing), pointed to by "s_ops". In OO terms, these are "instance members" and "methods" of the segment driver class. Moreover, each segment driver such as seg_vn could be viewed as a derived class of the abstract class "seg", with s_ops as its "virtual methods" (whereas functions acting on "seg"s would be non-virtual). For seg_vn mappings of *files*, the file (more precisely, its vnode) is located through seg.s_data -> segvn_data.vp and the offset in this file is segvn_data.offset -- Fig. 9.10 p. 483 For seg_vn mappings of *anonymous* memory (described in Ch. 9.6-7) the picture is much more complicated, because of the need to keep (a) the information about the logical extent of the allocated memory chunks -- cf. "struct anon" for each page chunk, "struct anon_map" for sharing whole anonymous segments between application processes, and everything in-between. See mdb-walk-anon-mmap.txt for details. (b) the info about its location in the "swapfs" (see Ch. 9.8) where is could be swapped out, and (c) the uniform scheme of unique "identity" for every allocated physical page, i.e., the ability to locate that physical page uniquely by its unique pair of associated with it via "page_hash". Note: If the "identity" vnode is not present naturally, it is created specifically for the purpose of identifying and referencing a group of physical pages. See Fig. 9.11 p. 486. Thus we get from virtual addresses and semantically different contiguous areas of virtual memory (.text vs .data vs heap vs stack, etc. all have different purposes and need to be treated differently by the OS according to their intended functionality) to vnodes and offsets. This is, of course, an expression of the fundamental UNIX philosophy that "everything is a file" (more precisely, that the resources programs act on are all accessible through a system-global namespace with "paths" and "filenames", and allow a universal set of operations such a "open", "read", "write", "seek" and so on that treat the resource as a stream of bytes, plus some special operations performed via the "ioctl" interface). identity of memory segments/regions/pages is where the virtual space-side view of memory meets the physical page-side view that connects the hardware memory management with OS abstractions. 2. Botton-up: From the physical pages to the respective OS objects to which they are allocated. For each physical page, Illumos maintains one page_t ("struct page"). The array/list of these structs serves many purposes at once, all of these uses supported by different page_t* pointer members of page_t, defined in http://src.illumos.org/source/xref/illumos-gate/usr/src/uts/common/vm/page.h#498 (see the long commentary above) and explained in Ch 10.2, in particular refer to Fig. 10.2 and Fig. 10.3 p. 506-7: (a) global page_hash to map the "identity" --> page physical number (PFN) (page_t.p_hash implements collision lists) (b) list of pages corresponding to a vnode, for pages associated with a vnode, either as a result of file mmap-ing, or for specially created "identity" vnodes such as "anonymous" memory allocations within a process or within the kernel. This is done through the p_vnode, p_vpnext, and p_vpprev members, and "struct vnode"'s v_pages member that points to a page in the page_t list connected with p_vpprev and p_vpnext pointers. (c) freelist of available pages no longer associated with any valid file chunk contents. The "freelist" is actuall several lists of pages, freelist[][][][], due to the cache-related "coloring" optimizations (Ch. 10.2.7) and CPU support for different page sizes (Ch. 9.10) (d) cachelist of pages that are available for allocation but still contain valid file chunk contents from a previous file mapping. In other words, these pages can be reused if the file is opened again (e.g., by another process). This is what makes it a *cache*: when a file is read or written at an offset for which the file's vnode and the offset as an "identity" resolve to a valid physical page, that page need not be loaded from the disk, being already in RAM.