[Ltrace-devel] Tracing multi-threaded processes

pmachata at redhat.com pmachata at redhat.com
Tue Jun 21 16:10:25 UTC 2011


Hi there,

there is some support for multi-threaded processes in ltrace, but so far
it was incomplete.  Everything works if the threads stay away of each
other, but as soon as they end up in the same area of code, it all
breaks.

The problem is due to return breakpoints.  When two threads take the
same function call, ltrace places two breakpoints over each other,
because it has no concept of shared address space.  There are many
problems with this, and ltrace ends up seeing unexpected breakpoints,
and SIGSEGVing the process.

The way to solve this, ltrace must first learn that there is any such
thing as task and thread group.  Then it needs to store all the
breakpoints in the structure shared by all the tasks in the thread
group.  To prevent races, before any breakpoint is temporarily disabled
(for re-enablement, namely continue_after_breakpoint), all tasks in the
thread group must be stopped.

There is a code on the branch pmachata/threads that implements this.
Here's what the branch roughly does:

 - Process * leader; was added to struct Process.  This points to a
   process that is a thread group leader of a thread group that this
   process is a member of.

 - proper interfaces were added for handling the set of processes and
   their tasks (add_process, remove_process, each_process, each_task).
   The iteration interfaces (each_*) use call-backs to do the real work.

 - interfaces were added for accessing the information about the
   processes (process_leader, process_tasks, process_stopped,
   process_status).

 - a new interface task_kill is a wrapper for the SYS_tkill system call
   that is not wrapped by glibc.  We use this to stop or continue a
   single task.

 - when we need to stop tasks for breakpoint re-enablement, we send
   SIGSTOP.  This SIGSTOP has to be caught and sunk.  While we wait for
   the signal to be delivered, we pump all incoming events to an event
   queue that was created for this purpose (each_qd_event, enque_event).
   The interface next_event takes events from the queue if there are
   any.

 - all this, the event interception, sinking of SIGSTOP etc., is very
   platform specific.  So thread group now can have a registered event
   handler (install_event_handler, destroy_event_handler).  If present,
   this is called at the beginning of handle_event.  The registered
   handler can do whatever it wishes with the event in question, and
   return either NULL (if the event was handled or sunk) or the original
   (possibly modified) event that is then handled by the default handler
   as usual.

 - there have also been some small cleanups.

For some reason, attaching to running multi-threaded task doesn't work
(this was one of the first things that I fixed, but apparently it got
broken in the meantime), so that's what I'll be doing next.

Then comes cleaning it all up and making the git history of my branch a
bit less messy, at which point I'd ask some of you to review the (rather
large) patch.  I also need to verify that it works on non-x86
architectures, so far I was only working with x86_64.  I'll keep you
posted as my work progresses.

Any comments are welcome.

Thanks,
PM



More information about the Ltrace-devel mailing list