The getpid() system call would appear to be one of the simplest system
calls. After all, it only needs to get one integer out of the kernel,
and takes no arguments that the library call getpid() might need to
prepare, check, or rewrite before the syscall---unlike, for example,
the family of exec* calls (see "man 3 exec") that all lead to the same
system call execve ("man 2 execve") after different handling of their
arguments.

Yet even getpid's libc implementation harbors surprises. Let's 
compile it statically and look at it. The specific instructions
may be different on your machine, but working through them
will likely be similar enough. (I am using gcc 4.6.1 and Glibc 2.13
on a 32-bit Ubuntu distro and kernel; things will be different on
64-bit ones, but should be recognizable nevertheless.)

sergey@toy32:~$ gcc -static -o exec-stat exec.c 
sergey@toy32:~$ objdump -d exec-stat | less

The dump file is, of course, quite large, as it includes all 
functions needed for printf() and everything else out of libc()
that is relevant to a self-contained statically linked process.
We'll find main(), and then work from there:

08048cb0 <main>:
 8048cb0:       55                      push   %ebp
 8048cb1:       89 e5                   mov    %esp,%ebp
 8048cb3:       83 e4 f0                and    $0xfffffff0,%esp
 8048cb6:       83 ec 20                sub    $0x20,%esp
 8048cb9:       c7 44 24 18 68 f4 0a    movl   $0x80af468,0x18(%esp)
 8048cc0:       08 
 8048cc1:       c7 44 24 1c 00 00 00    movl   $0x0,0x1c(%esp)
 8048cc8:       00 
 8048cc9:       e8 02 aa 00 00          call   80536d0 <__getpid>   // <---
 8048cce:       ba 70 f4 0a 08          mov    $0x80af470,%edx
 8048cd3:       89 44 24 04             mov    %eax,0x4(%esp)
 8048cd7:       89 14 24                mov    %edx,(%esp)
 8048cda:       e8 01 0b 00 00          call   80497e0 <_IO_printf>
 8048cdf:       c7 04 24 3c 00 00 00    movl   $0x3c,(%esp)
 8048ce6:       e8 d5 a6 00 00          call   80533c0 <__sleep>
 8048ceb:       8d 44 24 18             lea    0x18(%esp),%eax
 8048cef:       89 44 24 04             mov    %eax,0x4(%esp)
 8048cf3:       c7 04 24 68 f4 0a 08    movl   $0x80af468,(%esp)
 8048cfa:       e8 a1 a9 00 00          call   80536a0 <execv>
 8048cff:       c9                      leave  
 8048d00:       c3                      ret    
 8048d01:       90                      nop
 8048d02:       90                      nop
 8048d03:       90                      nop

Skipping to __getpid:

080536d0 <__getpid>:
 80536d0:       65 8b 15 6c 00 00 00    mov    %gs:0x6c,%edx
 80536d7:       83 fa 00                cmp    $0x0,%edx
 80536da:       7e 04                   jle    80536e0 <__getpid+0x10>
 80536dc:       89 d0                   mov    %edx,%eax
 80536de:       f3 c3                   repz ret 
 80536e0:       75 10                   jne    80536f2 <__getpid+0x22>
 80536e2:       65 a1 68 00 00 00       mov    %gs:0x68,%eax
 80536e8:       85 c0                   test   %eax,%eax
 80536ea:       8d b6 00 00 00 00       lea    0x0(%esi),%esi
 80536f0:       75 ec                   jne    80536de <__getpid+0xe>
 80536f2:       b8 14 00 00 00          mov    $0x14,%eax         // syscall number of sys_getpid()
 80536f7:       ff 15 bc 60 0d 08       call   *0x80d60bc         // <----
 80536fd:       85 d2                   test   %edx,%edx
 80536ff:       89 c1                   mov    %eax,%ecx
 8053701:       75 db                   jne    80536de <__getpid+0xe>
 8053703:       65 89 0d 68 00 00 00    mov    %ecx,%gs:0x68
 805370a:       c3                      ret    
 805370b:       90                      nop
 805370c:       90                      nop
 805370d:       90                      nop
 805370e:       90                      nop
 805370f:       90                      nop

(Look up 32-bit syscall numbers at http://syscalls.kernelgrok.com , 64-bit ones
at http://blog.rchapman.org/post/36801038863/linux-system-call-table-for-x86-64)

First, where is the actual system call? There seems to be no sign of
the "int 0x80" or "sysenter". Well, it hides behind the 
"call *0x80d60bc". The target of this call will, of course, be found at
runtime at the memory address 0x80d60bc, so we can't get it from the
static disassembly, and need to run "gdb" on the process to recover it:

sergey@toy32:~$ gdb ./exec-stat
GNU gdb (Ubuntu/Linaro 7.3-0ubuntu2) 7.3-2011.08
This GDB was configured as "i686-linux-gnu".
For bug reporting instructions, please see:
<http://bugs.launchpad.net/gdb-linaro/>...
Reading symbols from /home/sergey/exec-stat...(no debugging symbols found)...done.
(gdb) b __getpid
Breakpoint 1 at 0x80536d0
(gdb) r
Starting program: /home/sergey/exec-stat 

Breakpoint 1, 0x080536d0 in getpid ()
(gdb) x/x 0x80d60bc
0x80d60bc <_dl_sysinfo>:	0x00110414
(gdb) x/3i 0x00110414
   0x110414 <__kernel_vsyscall>:	int    $0x80
   0x110416 <__kernel_vsyscall+2>:	ret    
   0x110417:				add    %ch,(%esi)
(gdb) 

So this makes some sense: the target is simply a stub to the syscall
instruction. The reason for this is that some x86 PCs work faster with
"sysenter" or "syscall" instructions, and the static Libc uses this
trick to allow slipping in the right stub even into compiled code. More
about this can be found by searching for "Linux vsyscall" or "Linux vDSO"; e.g.,
http://davisdoesdownunder.blogspot.com/2011/02/linux-syscall-vsyscall-and-vdso-oh-my.html

Secondly, what is all this code surrounding this redispatched system call?
Luckily, we can peek at the source:
http://osxr.org:8080/glibc/source/nptl/sysdeps/unix/sysv/linux/getpid.c?v=glibc-2.13
(really_getpid is where the syscall happens; note that this function
is inlined, and so its code is just merged into __getpid's, without
bothering with call, the stack frame, or the calling convention for
passing arguments; it's just stitched in).

As you wade through the layers of macros typical of code intended for
many kinds systems at once (Linux kernel code is the same way), you will see
that the logical is simple enough caching of the PID (or thread ID,
TID) value, to save the overhead of a system call.

Note that although the PLT is in fact found in the compiled code,
the calls lead straight to the __getpid, __execve, __sleep, etc.
This is done by way of macros, too: notice  "libc_hidden_def" at
the end of Libc's getpid.c and the definitions/explanations in
http://osxr.org:8080/glibc/source/include/libc-symbols.h?v=glibc-2.13#0490

To completely understand the macros, you need GNU C preprocessor extensions,
such as ## , which affects the compiler's idea of tokenization:
Specifically, https://gcc.gnu.org/onlinedocs/cpp/Concatenation.html#Concatenation
and https://gcc.gnu.org/onlinedocs/cpp/Variadic-Macros.html#Variadic-Macros
Reading through https://gcc.gnu.org/onlinedocs/cpp/Macros.html#Macros will
help---especially the "Pitfalls" part.

You will also need to google for some GCC's C extensions, which syntactically
tend to start and end with __ 
(and are described in https://gcc.gnu.org/onlinedocs/gcc/C-Extensions.html,
 which is quite a long read, so piecemeal googling may be faster).

Linux kernel uses these extensions too:
http://www.ibm.com/developerworks/linux/library/l-gcc-hacks/

The references to %gs:<offset> are for Linux's thread-local storage.
The segment selector GS is used to point to that per-thread special
storage ares in each thread; its value is changed with every context
switch (being a thread's own for each thread). Various per-thread
pieces of data are found at small standard offsets to the specific
memory page that GS points to in each thread. 

A quick summary:
http://stackoverflow.com/questions/24793556/addresses-of-thread-local-storage-variables
http://stackoverflow.com/questions/8747070/thread-local-variables-and-fs-segment

Not surprisingly, details this deep are closely related to the ABI:
http://www.akkadia.org/drepper/tls.pdf , http://wiki.osdev.org/Thread_Local_Storage
(some further questions:
http://stackoverflow.com/questions/12878698/what-are-the-real-elf-tls-abi-requirements-for-each-cpu-arch)

But some of these instructions look seriously weird (these may vary on
your machine!).  "REPZ RET" surely takes the cake. It is functionally
a normal RET, but designed to play well with AMD branch prediction and
even instruction decoding. In case you encounter it, its story is
fascinating: http://repzret.org/p/repzret/

In case you wonder about "lea 0x0(%esi), %esi", it's a NOP---just a very
long 6-byte one. There are some advantages to longer NOPs over a series
of one-byte NOPs, due to how instructions are fetched & decoded
(e.g., http://stackoverflow.com/questions/10505690/what-is-the-meaning-of-lea-0x0esi-esi).

Make sure you understand the logic flow of the above __getpid() code---it's
a very useful exercise in assembly reading.

And that's what it takes to getpid on Linux :)