Hello, my name is Nikos Naziridis and I am a security researcher at CENSUS.
In this post, I will present how SystemTap and kernel instrumentation
in general, could be used to aid the process of determining
the exploitability of unbound memory overflows and the detection of
thread race condition bugs.
For the reader who is not familiar with SystemTap and the concepts of
kernel instrumentation, I will attempt to make a small introduction,
however for more details please refer to the
In this post I will be talking about the Linux kernel versions >= 2.6 and SystemTap versions >= 2.2. If you feel at ease with SystemTap or how kernel
instrumentation works you may skip to the next section.
Kernel instrumentation is a set of techniques that allows a user to monitor
and trace the execution of a kernel. One popular implementation of kernel
instrumentation is the Kprobes (Kernel Probes) system. Kprobes allows a user
to develop a kernel module that will hotpatch specific instructions in the
kernel code with trampoline functions. If such a patched instruction is
executed, then the execution flow jumps into one of these aforementioned
functions, runs user provided code and then returns to the original instruction.
SystemTap automates the process of developing a kernel module and abstracts
the user from the kernel specifics. It does this by exposing a variety of
probe points for Kprobes and provides utility functions in a scripting
language it understands. When a script in this language is run through
SystemTap, it is translated to C code, compiled as a kernel module and
loaded into the running kernel automatically.
Imagine a heap buffer overflow vulnerability that is the result of an
unsigned integer overflow used as the size parameter for
memcpy() call. The nature of the bug, while common enough,
introduces complications to its exploitation. The following snippet,
though unrealistic, is sufficient to showcase the bug:
int one = ...;
int two = ...;
char stack = "...";
char *heap = (char *)malloc(sizeof(one));
memcpy(heap, stack, (one — two));
It is apparent that if
two was larger than
then the result of
(one — two) would be negative. Of course,
the size parameter that
memcpy() expects is
which is an unsigned type. This would result in an unsigned integer overflow,
and the negative integer would be interpreted as a very large number.
Unable to detect this,
memcpy() would try to copy this
large number of bytes from
heap until it
would trigger an access violation. So, determining the exploitability of a
vulnerability such as this, boils down to whether the attacker can control
the size of the overflow or not.
One solution to situations similar to the above, is to try and provide a
two, such that would produce an
integer underflow and wrap around to a size that suits the attack scenario.
But there are many occasions where this is not possible, so let’s assume
this is the case.
Since inducing an underflow is out of the question and there are no
arithmetic operations that could provide control over the overflow size,
another solution would be to look at
Gobbles’ apache-scalp exploit for BSD systems in 2002, solved a
similar case by abusing the fact that the BSD
memcpy() stores the
number of remaining bytes to be written in a stack variable. So, by overwriting this value, an attacker could dynamically control the overflow size. Though, in this case, the overflow copies data from the stack to the heap, so again this
would not work.
If there is no “conventional” way to control the overflow size, then how about
trying to stop or delay it? Imagine, that the snippet above was part of a large threaded program. Then, in theory, there could be a thread race condition
situation (not necessarily a bug) that would allow us to use a portion
of the overflown memory from one thread,
before the preempted thread that does the
reaches a protected/unmapped area. But even if such a thing was possible,
how would we debug this?
The Linux kernel uses a scheduler with dynamic priorities, that supports preemption. This means that at any given moment the thread being executed can be replaced by another thread that is considered by the scheduler as more
Since the scheduling happens in the kernel, it should be “accessible” from a
kernel module. So, by using SystemTap we should be able to monitor the
preemption. Indeed, we could use the
scheduler.cpu_off probe to
do something like:
global goflag = 0, interesting_pid = 0
# probe scheduler every time a task is switched
# if execution reached the interesting point
# find the pid and base (dynamic) priority of
# the tasks involved
prevpid = task_pid(task_prev);
nextpid = task_pid(task_next);
prevprio = task_prio(task_prev);
nextprio = task_prio(task_next);
# inform the user
"switched %lu (p: %lu) to %lu (p: %lu)\n",
This adds a probe point (called tap in SystemTap’s lingo) that would be called
every time a thread is being scheduled off a CPU core. If you would run this
with SystemTap, it would produce a garbage-ridden output that would contain
all thread preemptions occuring in the system.
To actually produce output that is relevant to the problem at hand, we need to
be able to run our code just before the
memcpy() call and until a SIGSEGV or other terminating condition
occurs. Fortunately, SystemTap implements many different probe points that can
help with this, namely
Therefore, we can use something like the following to start monitoring:
# set a probe for the interesting point
# use some globals to enable probing when
# the execution reaches whatever.c:1337 (file:line_number)
# store the current pid (interesting pid)
currtask = task_current();
interesting_pid = task_pid(currtask);
# set the go flag
goflag = 1;
printf("reached interesting point; starting...\n");
To define an ending condition, we can add:
# if you detect an access violation (SIGSEGV == 11)
# check if it is intended for the interesting task
currtask = task_current();
currpid = task_pid(currtask);
if (interesting_pid == currpid)
if (sig == 11) # SIGSEGV
# inform and die
"detected SIGSEGV to process %lu; stopping...\n",
In the case that there are no debugging symbols available for our target application, we could use
process().statement(ADDRESS).absolute and provide an absolute
virtual address for the probe.
By using start and end conditions, the above script would only show threads
preempting in the critical time window.
Using SystemTap and Kprobes we have implemented a way to examine the threads
that preempt the thread that does the vulnerable
memcpy() call. We can also put our target application under stress conditions (for example,
forcing it to process large amounts of user input) in order to see if the
memcpy() thread can indeed be preempted by some other thread.
If there is such a thread, we can now carefully study it in order to see if
it accesses the partially overflown memory, or any variables overwritten
by it, and determine if there are exploitable conditions.
Check the following links for more details on the subject discussed: