---------------------[ Memory ordering and barriers ]---------------------

Looking through mutex_exit() implementation, you may have noticed that
mutex_exit() does not contain any atomic instructions; there is neither
the LOCK prefix nor any natively atomic/bus-locking instructions.  

698     ENTRY(mutex_exit)
699 mutex_exit_critical_start:          /* If interrupted, restart here */
700     movq    %gs:CPU_THREAD, %rdx    // current thread ptr
701     cmpq    %rdx, (%rdi)            // NOT atomic, no LOCK
702     jne     mutex_vector_exit       /* wrong type or wrong owner */
703     movq    $0, (%rdi)              /* clear owner AND lock */
704 .mutex_exit_critical_end:
705 .mutex_exit_lockstat_patch_point:
706     ret

But aren't synchronization primitives supposed to be atomic? Modern
processors may reorder memory loads and stores quite liberally (see
links on memory reordering below), and the results of two CPUs
accessing the lock's opaque memory on a multiprocessor system may be a
surprise.

Indeed, the comment at line 63
http://src.illumos.org/source/xref/illumos-gate/usr/src/uts/common/os/mutex.c#63
explains the problem: we might block on a mutex that's just been released!

The comment goes on to explain that the problem can be solved without an
atomic instruction in mutex_exit(), and how it was solved. Note line 124:

124 * It has been verified by exhaustive simulation that all possible global
125 * memory orderings of (2M) interleaved with (3M) result in correct
126 * behavior.  
...

This is quite amazing and contrary to every textbook example, but that's
how it works!

This explanation mentions memory bars (restrictions on certain memory
operation reorderings). More on these:

---------------------[ On memory reordering & membars ]---------------------

A short and simple explanation:
https://www.ibm.com/support/knowledgecenter/linuxonibm/liaaw/ordering.2006.03.13a.pdf

An in-depth explanation (you should read this before you graduate!):
https://www.akkadia.org/drepper/cpumemory.pdf

Linux kernel doc on memory ordering & memory bars:
https://www.kernel.org/doc/Documentation/memory-barriers.txt

---------------------[ A side-note on preemption ]---------------------

So we can do without a LOCK. But what about pre-emption? That is, what if
an interrupt hits the CPU on which mutex_exit() is running before the movq
on line 703, which clears the lock? 

Note the labels mutex_exit_critical_start and mutex_exit_critical_end.
Comment at 
http://src.illumos.org/source/xref/illumos-gate/usr/src/uts/intel/ia32/ml/lock_prim.s#512
explains how they are used to avoid this race condition without disabling 
interrupts: the interrupt handler, if invoked, will check whether the program
counter is between these labeled addresses, and adjust the PC back to the
beginning of mutex_exit():
http://src.illumos.org/source/xref/illumos-gate/usr/src/uts/i86pc/os/intr.c#1495

---------------------[ Misc links ]---------------------

Discussion of systemd design failures:
http://ewontfix.com/14/ , http://ewontfix.com/15/

Example of a leak in file descriptors at close() due to thread cancellation:
http://ewontfix.com/2/ , http://ewontfix.com/4/