[Ltrace-devel] Getting prototypes from debug information

Thu May 1 23:26:42 UTC 2014

Hello there.

Petr Machata <pmachata at redhat.com> writes:

> Dima Kogan <lists at dima.secretsauce.net> writes:
>
>>> 1. filter_matches_symbol() and dwlf is leaking. Every time I call these
>>> we leak a bit. The dwfl is out of our hands, but the
>>> filter_matches_symbol() maybe could be plugged:
>>
>> I looked into this. Ltrace is definitely leaking it. The regex is
>> released when filter_destroy() calls filter_rule_destroy(), but those
>> are not called by anything.
>
>Ah, there we go.  I just saw that we call regfree, but didn't check
>whether we actually call those.  Will you roll this into your change
>set, or should I look into it?

I'd rather you looked at it, if you don't mind.

>> 2. The prototypes my DWARF parser produces leak a bit. Those are stored
>> with protolib_add_prototype(). Is that sufficient? I.e. do protolibs
>> eventually clean out their prototypes?
>>
>>   ==22679== 624 bytes in 13 blocks are definitely lost in loss record 142 of 152
>>   ==22679==    at 0x4C29590: calloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
>>   ==22679==    by 0x42AA08: get_prototype (dwarf_prototypes.c:787)

I plugged this leak yesterday. Sorry for sending out the email before I
did that.

> By the way, looking into dwarf_prototypes.c, I see a number of style
> points that will need addressing before this is accepted:
>
> - Tabs should be 8 characters.  E.g.:
> - Lines should be no longer than 80 characters.  Same example as above.
> - Pointer star belongs to the variable, not the type
> - Lines shouldn't be formatted into tables arbitrarily.  E.g. this:
> - if and while should get a space before the paren.

Fine. I'll patch it.

> We do care somewhat.  If it's not ridiculously hard to free things, we
> should strive to free them to keep valgrind runs clean.  Ideally if the
> test suite passes, it passes with --enable-valgrind as well (it used to
> at some point, but I don't know anymore).

Right. The new code should be leak-free now.

>> Note that nanosleep didn't get its prototype parsed even though the
>> DWARF is available. It turns out that the exported symbol is indeed
>> called "nanosleep", but the DWARF definitions have a DW_AT_name of
>> "__nanosleep" and a DW_AT_linkage_name "__GI___nanosleep". The DWARF has
>> no mention at all of "nanosleep", which is the main issue. How can we
>> infer a connection between those two?
>
> I think what should work here is to look at DW_AT_low_pc of the
> DW_TAG_subprogram (dwarf_lowpc in libdw seems to be handling this) and
> cross-match it with ELF symbol tables, where each address will have a
> number of alias symbols.  I guess you could just walk through ltrace's
> own data structures, struct library and struct library_symbol, and look
> into ::enter_addr of the latter to figure out what symbol name ltrace
> assigned to this address.

OK. I'll try that.

>> Another issue. I have debug symbols for libjpeg installed, so I wanted
>> to run a program that uses this library to see if the DWARF is parsed
>> correctly. Not only is the DWARF not being loaded, the library isn't
>> being traced at all:
>
> Not sure what could be going wrong.  My geeqie isn't linked vs. libjpeg,
> but for me the above works for other libraries that geeqie is linked
> against.  Does it break for you for, say, libm as well?  That gets a
> number of calls in my case.

I looked deeper into this, and it appears to work with a particular set
of ltrace options.

First off, I'm on Debian running geeqie 1:1.1-8+b1. Here geeqie is
multi-threaded, and I guess without -f geeqie only looks at the first
thread. All the libjpeg calls apparently come from the second thread, so
by default ltrace sees nothing. Running with -f makes it work better.
This is a bit puzzling to the end-user, but I guess this is fine, since
this is what strace does. (Does strace work this way for threads, by the
way, or just forks?) The command at this point is

 ./ltrace -l 'libjpeg.so*' /usr/bin/geeqie /tmp/small.jpg

Second, running this way reports libjpeg->libjpeg calls correctly, but
not geeqie->ltrace ones. So some libjpeg functions still get the wrong
prototypes. For instance I see this:

 geeqie->jpeg_read_header(0x7f23f62506c0, 1, 0, 1496 <unfinished ...>

I looked into this a bit. import_DWARF_prototypes() was only called on
libjpeg.so.8. jpeg_read_header() was indeed parsed correctly, the parsed
data just wasn't used. Were we looking in the 'geeqie' plib instead of
the 'libjpeg.so.8' plib? If I run with

 ./ltrace -x '@libjpeg.so*' /usr/bin/geeqie /tmp/small.jpg

then way more than libjpeg.so is instrumented, and the application slows
to a crawl. Is this right? Should the options preclude this? If I do

./ltrace -x '@libjpeg.so*' -e '@libjpeg.so*' /usr/bin/geeqie /tmp/small.jpg

then jpeg_read_header works correctly. Is this a bug in the new code?

Also, regarding -l, -e and -x, I feel this is more confusing than it
needs to be. As a user, I generally want either 'show me all calls into
this library' or 'show me all calls into this function'. I guess this is
-l and -x, except -x ends up instrumenting way more (see above). And
then what's -e for? 

OK. Thanks for the review as always.