1. System calls as the centerpiece of a Unix kernel. All privileged operations in Unix are performed on behalf of user processes by "system call" code located in the kernel. The data that this code operates on is also located in the kernel and can only be directly accessed when the CPU is in "kernel mode". This ensures that user processes get to use this code only as a "package deal", with the up-front permission and sanity checks being a part of the package. This mechanism is the basis of the OS stability and security. A short blog summary with nice pictures: http://duartes.org/gustavo/blog/post/system-calls/ 2. Some Linux details: User-level code accesses syscall code through the so-called "call gate" mechanism: it sets the number of the desired call in a register (EAX on Linux/x86), sets arguments or pointers to arguments in other registers (EBX, ECX, EDX, ... on Linux 32bits) and executes the "int 0x80" instruction (older 32bit systems), or "syscall" or "sysenter" instructions (newer and 64bit systems). Note that the system call function is accessed only by it number, not by its address, which user-level code cannot "jump" or "call" to (if it tries, a segfault occurs). The "int 0x80" instruction simultaneously puts the CPU into the kernel mode ("ring 0") and transfers control to the address stored in the 0x80-th slot of the x86 CPU's Interrupt Descriptor Table (which is pointed to by the CPU's special IDTR register). That address is *the single entry point* for all system calls. Look at the nice Fig. 1 in this IBM developer article on syscalls: http://www.ibm.com/developerworks/linux/library/l-system-calls/ Look at ENTRY(system_call) in an older Linux kernel: http://www.cs.dartmouth.edu/~sergey/cs108/rootkits/entry.S Note the "call *sys_call_table(,%eax,4)" intruction. According to the Linux syscall calling convention, the system call number is passed in EAX, and 4 is the pointer length in 32 bit systems, so this just calls the implementation of the system call through its function pointer in the EAX-th slot of the sys_call_table, which is a table of function pointers. In the above file, the common entry point for system calls is the ENTRY(system_call) at line 241. (The ENTRY macro creates a linkable symbol for the linker to pick up later; you can see this symbol in your /boot/System.map file, which contains all public kernel symbols.) Note the saving of all userland registers by the SAVE_ALL macro, and the switch to a fixed value __USER_DS in the data segment selectors (%ds, %es). Also note how the original values pushed to stack are restores in RESTORE_* macros. (For now, ignore the code that gets emitted into other sections (between ".section .fixup,"ax";" and ".previous), as it is not meant to run on a normal system call path.) Note that %ds and %es are restored, too. Note the all-important IRET instruction that returns control to userland, restoring (by popping them from the stack) not only the EIP back to the place in userland program right past the "int 0x80" that caused the system call to go into the kernel, but also the CS and EFLAGS registers. Read about the IRET, SYSEXIT, and SYSRET instructions in the Intel manual. Intel's summary of the registers involved in setting up system calls: http://www.cs.dartmouth.edu/~sergey/cs258/ia32-system-registers-and-data-structs.pdf 3. OpenSolaris/Illumos Syscall numbers exposed in Solaris in: /etc/name_to_sysnum Syscall numbers defined in: /usr/src/uts/common/sys/syscall.h (http://src.illumos.org/source/xref/illumos-gate/usr/src/uts/common/sys/syscall.h) Syscalls dispatched in: /usr/src/uts/intel/ia32/os/syscall.c http://src.illumos.org/source/xref/illumos-gate/usr/src/uts/intel/ia32/os/syscall.c Observe: dosyscall() gets the address of the requested syscall function by "code" in syscall_entry() then executes it by function pointer (lines 896--898). System call table: usr/src/uts/common/os/sysent.c http://src.illumos.org/source/xref/illumos-gate/usr/src/uts/common/os/sysent.c Observe: Line 439 and below, struct sysent sysent[NSYSCALL] = ... 4. A simple syscall: getpid() http://src.illumos.org/source/xref/illumos-gate/usr/src/uts/common/syscall/getpid.c#42 Looks up the PID via the pointer to the current thread descriptor curthread (follows the pointer to the process structure of type proc_t, then locates the integer PID value through that). Kernel struct that keeps process data (alongside with some others, explained briefly on pp. 44--48 of the textbook, details in Section 2.4): http://src.illumos.org/source/xref/illumos-gate/usr/src/uts/common/sys/proc.h#130 Suggestion: explore proc_t for the linking between process structs. How many other proc_t's are linked to it and why? (Many...) 5. Reading kernel code The OpenGrok code browsing system is at http://src.illumos.org/source/ Kernel code "lives" under project "illumos-gate", under the path /illumos-gate/usr/src/uts/ (note UTS, which stands for Unix Time-Sharing, a very legacy name) Before we start reading kernel code in earnest, here are some idioms. ==== Functions defined in assembly ==== The ENTRY_* macros create function symbols that the linker will treat as normal C functions (when C functions are compiled into assembly, similar assembly is actually generated for them, too): http://src.illumos.org/source/xref/illumos-gate/usr/src/uts/intel/ia32/sys/asm_linkage.h#210 #define ENTRY_NP(x) \ .text; \ <--- place in .text (code) segment .align ASM_ENTRY_ALIGN; \ <--- align at 4 byte boundary .globl x; \ <--- make macro's arg a global symbol.. .type x, @function; \ <--- of type "function" x: <--- here it starts... ==== Getting pointer to current thread ==== http://src.illumos.org/source/xref/illumos-gate/usr/src/uts/intel/asm/thread.h -- thread pointer context: extern __inline__ struct _kthread *threadp(void) { void *__value; #if defined(__amd64) __asm__ __volatile__( "movq %%gs:0x18,%0" /* CPU_THREAD */ : "=r" (__value)); #elif defined(__i386) __asm__ __volatile__( "movl %%gs:0x10,%0" /* CPU_THREAD */ : "=r" (__value)); #else #error "port me" #endif return (__value); } For explanations of the __asm__ embedding of Assembly into gcc C code, see http://www.ibm.com/developerworks/library/l-ia.html, or http://www.cs.virginia.edu/~clc5q/gcc-inline-asm.pdf (local copy: gcc-inline-asm.pdf) for more details. (Note: for functions that include assembly, the kernel contains a "__lint" version of the code that does not actually get built but keeps the compiler in checking ("lint") mode happy. For more info see manpage of "lint"). (Many macros in /illumos-gate/usr/src/uts/common/sys/thread.h are nice and readable; the key point is threadp(), which is CPU-dependent.) For explanations of "extern __inline__" see (**). Here is how it is used: http://src.illumos.org/source/xref/illumos-gate/usr/src/uts/common/sys/thread.h#528 extern kthread_t *threadp(void); /* inline, returns thread pointer */ #define curthread (threadp()) /* current thread pointer */ #define curproc (ttoproc(curthread)) /* current process pointer */ #define curproj (ttoproj(curthread)) /* current project pointer */ #define curzone (curproc->p_zone) /* current zone pointer */ cf: in getpid() code: int64_t getpid(void) { rval_t r; proc_t *p; p = ttoproc(curthread); <--- will access local thread storage off %gs system call will make sure %gs segment selector is right for the process on behalf of which the system call is made, i.e., points to the right proc_t . r.r_val1 = p->p_pid; if (p->p_flag & SZONETOP) r.r_val2 = curproc->p_zone->zone_zsched->p_pid; else r.r_val2 = p->p_ppid; return (r.r_vals); } ================================================================ (**) extern __inline__ explained: http://publib.boulder.ibm.com/infocenter/compbgpl/v9v111/index.jsp?topic=/com.ibm.xlcpp9.bg.doc/language_ref/cplr243.htm -- "If you specify the __inline__ keyword, with the trailing underscores, the compiler uses the GNU C semantics for inline functions. In contrast to the C99 semantics, a function defined as __inline__ provides an external definition only; a function defined as static __inline__ provides an inline definition with internal linkage (as in C99); and a function defined as extern __inline__, when compiled with optimization enabled, allows the co-existence of an inline and external definition of the same function. For more information on the GNU C implementation of inline functions, see the GCC documentation, available at http://gcc.gnu.org/onlinedocs/." Why all this? See http://src.illumos.org/source/xref/illumos-gate/usr/src/uts/intel/ia32/ml/i86_subr.s#2402 -- a different definition in another file, and yet no linking problem)