SystemTap is Linux's answer to DTrace. It features a custom scripting language that serves the same purposes as D, but is more tightly coupled with Linux source code. SystemTap scripts are ultimately rewritten into C modules that use the kernel's Kprobes mechanism. These modules are transparently compiled and loaded into the kernel by the "stap" command, and deleted right after the script stops running (see below). You will need to spend some effort setting up SystemTap to work with your kernel. ----[[ Install SystemTap ]]----- On Ubuntu or Debian, use "apt-get install systemtap" and "apt-get install systemtap-doc". The latter gets installed into /usr/share/doc/systemtap-doc/ and contains a tutorial, language description, useful tech notes on internals, and a bunch of examples. Note that to run most examples you will need kernel debug symbols matching your kernel; see below how to download it. Of course, if you built your own kernel, you'd get the debug symbols from your compiler; on Debian or Ubuntu you just download a debug build of the kernel and trust that is matches the binary kernel included in your distribution. ----[[[ Examining SystemTap ]]]----- SystemTap consists of several packages, and its components get installed in various places on the system. Take a moment to ask the packaging system where these places are (check out the short help for "dpkg-query -h"): root@ubuntu64:/root# dpkg-query -W systemtap* systemtap 1.4-1ubuntu2 systemtap-common 1.4-1ubuntu2 systemtap-doc 1.4-1ubuntu2 systemtap-runtime 1.4-1ubuntu2 List all files owned by a package: root@ubuntu64:/root# dpkg-query -L systemtap /. /usr /usr/bin /usr/bin/stap /usr/share /usr/share/man /usr/share/man/man1 /usr/share/man/man1/stap.1.gz /usr/share/doc /usr/share/doc/systemtap /usr/share/doc/systemtap/NEWS.gz /usr/share/doc/systemtap/README.security root@ubuntu64:/root# dpkg-query -L systemtap-runtime /. /usr /usr/bin /usr/bin/staprun /usr/bin/stap-authorize-signing-cert /usr/lib /usr/lib/systemtap /usr/lib/systemtap/stapio /usr/share /usr/share/man /usr/share/man/man8 /usr/share/man/man8/staprun.8.gz # dpkg-query -L systemtap-runtime [It's more interesting. Check out tapsets in /usr/share/systemtap/tapsets/, these define many standard functions such as tid(), execname(), etc.] For example, tid() is defined in /usr/share/systemtap/tapsets/context.stp; read and understand its definition (tutorial.pdf from systemtap-doc should be enough for this; that's "embedded C" code). -----[[[ Getting kernel debuginfo symbols ]]]----- (from https://wiki.edubuntu.org/Kernel/Systemtap ) My script based on the above: -------- #!/bin/bash codename=$(lsb_release -c | awk '{print $2}') sudo tee /etc/apt/sources.list.d/ddebs.list << EOF deb http://ddebs.ubuntu.com/ ${codename} main restricted universe multiverse deb http://ddebs.ubuntu.com/ ${codename}-security main restricted universe multiverse deb http://ddebs.ubuntu.com/ ${codename}-updates main restricted universe multiverse deb http://ddebs.ubuntu.com/ ${codename}-proposed main restricted universe multiverse EOF sudo apt-key adv --keyserver keyserver.ubuntu.com --recv-keys ECDCAD72428D7C01 sudo apt-get update sudo apt-get install linux-image-$(uname -r)-dbgsym -------- Essentially, this script configures your package manager to accept signed packages from ddebs.ubuntu.com , and downloads the kernel debuginfo package matching your distribution and kernel versions. This script demonstrates several very useful features of Bash syntax, so you might want to study it until you understand it :) When you are done downloading, you'll get the entire (uncompressed) kernel and all of its modules in /usr/lib/debug/ ("# dpkg-query -L linux-image-3.0.0-12-generic-dbgsym" on my system, '3.0.0.-12-generic' being the output of `uname -r` from the last line of the above script.) In particular, here is the debuginfo kernel: # ls -l /usr/lib/debug/boot/vmlinux-3.0.0-12-generic -rw-r--r-- 1 root root 144419436 2011-10-07 16:08 /usr/lib/debug/boot/vmlinux-3.0.0-12-generic # ls -lh /usr/lib/debug/boot/vmlinux-3.0.0-12-generic -rw-r--r-- 1 root root 138M 2011-10-07 16:08 /usr/lib/debug/boot/vmlinux-3.0.0-12-generic # file /usr/lib/debug/boot/vmlinux-3.0.0-12-generic /usr/lib/debug/boot/vmlinux-3.0.0-12-generic: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), statically linked, not stripped (not stripped means with debug info present; strip(1) is the utility that removes symbol info to save disk space) For comparison, here is my actual kernel: # ls -l /boot/vmlinuz-3.0.0-12-generic -rw-r--r-- 1 root root 4658096 2011-10-12 10:34 /boot/vmlinuz-3.0.0-12-generic # file /boot/vmlinuz-3.0.0-12-generic /boot/vmlinuz-3.0.0-12-generic: Linux kernel x86 boot executable bzImage, version 3.0.0-12-generic (buildd@creste, RO-rootFS, root_dev 0x801, swap_dev 0x4, Normal VGA It's much smaller (4.5M) and is stored compressed with bzip . -----[[[DWARF: what is that "debuginfo"? ]]]----- Read Michael Eager's Introduction to DWARF Format http://www.dwarfstd.org/doc/Debugging%20using%20DWARF.pdf at http://dwarfstd.org/ for a brief but illuminating description of the format. -----[[ Running and patching SystemTap example scripts ]]----- inode-watch.stp in /usr/share/doc/systemtap-doc/examples/tutorial/ should now match its probes against your running (by consulting the downloaded -- and hopefully fully address-wise matching! -- kernel binary with debuginfo in DWARF format). It takes three parameters: device major number, device minor number, and inode number in the filesystem on that device. root@ubuntu64:/root# touch YYYY root@ubuntu64:/root# ls -i YYYY 4947 YYYY root@ubuntu64:/root# mount /dev/sda1 on / type ext4 (rw,errors=remount-ro,commit=0) proc on /proc type proc (rw,noexec,nosuid,nodev) sysfs on /sys type sysfs (rw,noexec,nosuid,nodev) root@ubuntu64:/root# ls -l /dev/sda1 brw-rw---- 1 root disk 8, 1 2014-01-29 23:06 /dev/sda1 So maj. no: 8, minor no: 1 (these correspond to hard drive partition numbers, actually), inode no: 4947 NOTE: In class, I first looked at /dev/sda, the device representing the raw disk rather than the Linux root filesystem partition /dev/sda1. /dev/sda exposes the drive's boot sector (dd if=/dev/sda of=boot.sect bs=512 count=1; xxd boot.sect) and the partition table of the disk, but not its file systems. Mount is your best guide to which partitions are in use ("mounted"); each partition gets its own block device file in /dev and its major and minor number. root@ubuntu64:/usr/share/doc/systemtap-doc/examples/tutorial# stap inode-watch.stp 8 1 4947 vim(9345) vfs_read 0x800001/4947 Note that if you use a text editor to edit the file, it may delete and recreate the file with a different inode number! E.g.: root@ubuntu64:/root# ls -i YYYY 4947 YYYY root@ubuntu64:/root# vim YYYY root@ubuntu64:/root# ls -i YYYY 66601 YYYY -----[[[ Patching a script ]]]----- The badname.stp example fails to match the second argument of the may_create function in fs/namei.c because that function is declared as inline . Inlining causes the matching of the second argument "child" to fail: root@ubuntu64:/usr/share/doc/systemtap-doc/examples/general# stap badname.stp semantic error: unable to find local 'child' near pc 0xffffffff81172f91 in may_create(/build/buildd/linux-3.0.0/fs/namei.c) (alternatives: $dir): identifier '$child' at badname.stp:16:7 source: if ($child->d_inode || $dir->i_flags & 16) next ^ semantic error: unable to find local 'child' near pc 0xffffffff81172f91 in may_create(/build/buildd/linux-3.0.0/fs/namei.c) (alternatives: $dir): identifier '$child' at :19:28 source: if (filter(kernel_string($child->d_name->name))) ^ Looking at the source, http://lxr.free-electrons.com/source/fs/namei.c?v=3.0 we see that may_create is indeed declared as inline. It's also static which means it is only used in this file and nowhere else -- which is convenient. Searching for its uses, we see that these occur right away in vfs_* functions, e.g.: 1917 int vfs_create(struct inode *dir, struct dentry *dentry, int mode, 1918 struct nameidata *nd) 1919 { 1920 int error = may_create(dir, dentry); 1921 1922 if (error) 1923 return error; and the missing second argument to may_create is actually just the second argument "dentry" to vfs_create and others. Hence we can easily patch the script to probe "upstream" vfs_* functions (we'll need to change '$child' to '$dentry', because we are now matching it in the scopes of vfs_create and others). Patched script: ----------------------------------------------------------------------- #!/usr/bin/env stap # badname.stp # Prevent the creation of files with undesirable names. # Source: http://blog.cuviper.com/2009/04/08/hacking-linux-filenames/ # return non-zero if the filename should be blocked function filter:long (name:string) { return euid() && isinstr(name, "XXX") } # # may-create is inlined in 3.0.0, and fails to match its second argument # $child. Move the check upsteam to vfs_create in same file. # The second argument is called "dentry" there, 1st is still "dir" # global squash_inode_permission probe kernel.function("vfs_create@fs/namei.c"), kernel.function("vfs_mkdir@fs/namei.c"), kernel.function("vfs_symlink@fs/namei.c") { print("in vfs_create\n") # screen out the conditions which may_create will fail anyway # SB: Make sure you understand why this is needed! if ($dentry->d_inode || $dir->i_flags & 16) next printf("in vfs_* d_name %s\n", kernel_string($dentry->d_name->name)) # check that the new file meets our naming rules if (filter(kernel_string($dentry->d_name->name))){ squash_inode_permission[tid()] = 1 ; print("Caught badname in vfs_create\n") } } probe kernel.function("inode_permission@fs/namei.c").return !, kernel.function("permission@fs/namei.c").return { if (!$return && squash_inode_permission[tid()]) $return = -13 # -EACCES (Permission denied) delete squash_inode_permission[tid()] } ----------------------------------------------------------------------- When you test this script, don't forget: a) -g option ("guru") to stap, so that the stap compiler allows you to overwrite the return value b) test as a non-root user (root or sudo user always passes the filter because for it eiud() returns 0). Probe other functions and run other examples! ------[ Looking at SystemTap scripts in C ]------ SystemTap produces C for loadable kernel models. You can catch these while the stap command is running (or just use the -k option of stap to not delete these files). The stages of processing a script are described in /usr/share/doc/systemtap-doc/INTERNALS (gunzip it if it comes as a .gz archive) Then in /usr/share/doc/systemtap-doc read langref.pdf, tutorial.pdf, and DEVGUIDE. DEVGUIDE and HACKING may get you some project ideas. Looking at the build files for stap: root@ubuntu64:~# stap /usr/share/doc/systemtap-doc/examples/tutorial/inode-watch.stp 1 8 66600 ^Z [1]+ Stopped stap /usr/share/doc/systemtap-doc/examples/tutorial/inode-watch.stp 1 8 66600 root@ubuntu64:~# cd /tmp root@ubuntu64:/tmp# ls -l stap* total 20644 -rw-r--r-- 1 root root 5891 2016-03-03 22:18 Makefile -rw-r--r-- 1 root root 69 2016-03-03 22:18 modules.order -rw-r--r-- 1 root root 0 2016-03-03 22:18 Module.symvers -rw-r--r-- 1 root root 35068 2016-03-03 22:18 stap_7ede0638d13b982ef3ba7d20902135af_6055.c -rw-r--r-- 1 root root 4115409 2016-03-03 22:18 stap_7ede0638d13b982ef3ba7d20902135af_6055.ko -rw-r--r-- 1 root root 3166 2016-03-03 22:18 stap_7ede0638d13b982ef3ba7d20902135af_6055.mod.c -rw-r--r-- 1 root root 7272 2016-03-03 22:18 stap_7ede0638d13b982ef3ba7d20902135af_6055.mod.o -rw-r--r-- 1 root root 4111216 2016-03-03 22:18 stap_7ede0638d13b982ef3ba7d20902135af_6055.o -rw-r--r-- 1 root root 927 2016-03-03 22:18 stapconf_8be06ce6d8238e499f60c07e4ccd5db2_526.h -rw-r--r-- 1 root root 12843239 2016-03-03 22:18 stap-symbols.h root@ubuntu64:/tmp# cat stap_7ede0638d13b982ef3ba7d20902135af_6055.mod.c cat: stap_7ede0638d13b982ef3ba7d20902135af_6055.mod.c: No such file or directory root@ubuntu64:/tmp# cat stapileKuP/stap_7ede0638d13b982ef3ba7d20902135af_6055.mod.c #include #include #include MODULE_INFO(vermagic, VERMAGIC_STRING); struct module __this_module __attribute__((section(".gnu.linkonce.this_module"))) = { .name = KBUILD_MODNAME, .init = init_module, #ifdef CONFIG_MODULE_UNLOAD .exit = cleanup_module, #endif .arch = MODULE_ARCH_INIT, }; static const struct modversion_info ____versions[] __used __attribute__((section("__versions"))) = { { 0x41572473, "module_layout" }, { 0x3ec8886f, "param_ops_int" }, { 0xc9ec4e21, "free_percpu" }, { 0x84cd86a4, "dput" }, { 0x58608d92, "lookup_one_len" }, { 0xe7dfa771, "get_fs_type" }, { 0x93260715, "register_kprobe" }, { 0xac3a0113, "register_kretprobe" }, { 0x9f984513, "strrchr" }, { 0x4b840dfd, "probe_kernel_read" }, { 0xac283e02, "kmalloc_caches" }, { 0xa963fd01, "kmem_cache_alloc_trace" }, { 0x56e18533, "free_vm_area" }, { 0x7e0f069d, "alloc_vm_area" }, { 0x4f6b400b, "_copy_from_user" }, { 0x6729d3df, "__get_user_4" }, { 0xfe7c4287, "nr_cpu_ids" }, { 0xc0a3d105, "find_next_bit" }, { 0x760a0f4f, "yield" }, { 0xb9249d16, "cpu_possible_mask" }, { 0xc2cdbf1, "synchronize_sched" }, { 0x7866fc51, "unregister_kretprobes" }, { 0x8b39cf9d, "unregister_kprobes" }, { 0xb88d9172, "mutex_unlock" }, { 0xf8ee7211, "mutex_lock" }, { 0xe4bca2d1, "pv_cpu_ops" }, { 0xe3b0192b, "vscnprintf" }, { 0x9eea1a9f, "_raw_read_unlock_irqrestore" }, { 0xa10129ea, "_raw_read_lock_irqsave" }, { 0x1e6d26a8, "strstr" }, { 0x672144bd, "strlcpy" }, { 0xcc07af75, "strnlen" }, { 0x3928efe9, "__per_cpu_offset" }, { 0x11089ac7, "_ctype" }, { 0x47c7b0d2, "cpu_number" }, { 0xce095088, "mod_timer" }, { 0x71205378, "add_timer" }, { 0x7d11c268, "jiffies" }, { 0x9e1bdc28, "init_timer_key" }, { 0xa07accab, "relay_buf_full" }, { 0xfae54379, "relay_open" }, { 0x6e653c90, "relay_file_operations" }, { 0xb3a307c6, "si_meminfo" }, { 0x7890fdde, "relay_close" }, { 0xf78d76c8, "relay_flush" }, { 0xe1bc7ede, "del_timer_sync" }, { 0xe2d5255a, "strcmp" }, { 0xb85f3bbe, "pv_lock_ops" }, { 0x6443d74d, "_raw_spin_lock" }, { 0x4cbbd171, "__bitmap_weight" }, { 0xbd100793, "cpu_online_mask" }, { 0x55f2580b, "__alloc_percpu" }, { 0x4f8b5ddb, "_copy_to_user" }, { 0xa1c76e0a, "_cond_resched" }, { 0xb00ccc33, "finish_wait" }, { 0xe75663a, "prepare_to_wait" }, { 0x1000e51, "schedule" }, { 0xc8b57c27, "autoremove_wake_function" }, { 0x23287643, "current_task" }, { 0x5a34a45c, "__kmalloc" }, { 0x5a5e7ea3, "simple_read_from_buffer" }, { 0x9edbecae, "snprintf" }, { 0x3628ecea, "debugfs_create_file" }, { 0xc0220855, "simple_empty" }, { 0x62e66bc1, "debugfs_create_dir" }, { 0xdfb20d87, "debugfs_remove" }, { 0x27e1a049, "printk" }, { 0xf09c7f68, "__wake_up" }, { 0xf9a482f9, "msleep" }, { 0x236c8c64, "memcpy" }, { 0x88941a06, "_raw_spin_unlock_irqrestore" }, { 0x587c70d8, "_raw_spin_lock_irqsave" }, { 0x37a0cba, "kfree" }, { 0xf0fdf6cb, "__stack_chk_fail" }, { 0xb4390f9a, "mcount" }, { 0x78764f4e, "pv_irq_ops" }, }; static const char __module_depends[] __used __attribute__((section(".modinfo"))) = "depends="; MODULE_INFO(srcversion, "A1DEDE20C4521106114094F"); ----------------------------------------------------------------------- This .mod.c file is the "version magic" that ties a compiled module binary (.ko) with the matching kernel. See Phrack 68:11 http://www.phrack.org/issues/68/11.html for the detailed description of this mechanism. Your (rewritten) code is in the much larger .c file. Peruse it to find *_init functions and the body of your probe logic. Have a look at other files as well. There is a Makefile that compiles the module against the kernel tree (you'll need to install linux-headers-generic for that, see https://sourceware.org/systemtap/wiki/SystemtapOnUbuntu for Ubuntu, https://sourceware.org/systemtap/wiki/SystemtapOnDebian for Debian; CentOS has a similar HOWTO, but I haven't tested it). The stap-symbols.h looks like a mystery---but it's actually DWARF symbol data on the types and variables involved; even though it is technically still a C source file, it has been already compiled, and requires disassembly. There are some tools that do it, but none directly, which is a bit of a shame. See the contents of the "dwarves" package (apt-get install dwarves; dpkg-info -L dwarves).