Bug#745823: libwww-perl: an https request with iso-8859-1 headers, chunked transfer and data with utf8 bit on is corrupted.

Niko Tyni ntyni at debian.org
Fri Apr 25 20:01:22 UTC 2014


found 745823 6.06-1
thanks

On Fri, Apr 25, 2014 at 05:10:36PM +0200, John Hughes wrote:
> Package: libwww-perl
> Version: 5.836-1
> Severity: normal
> 
> This was horrible to narrow down, but:
> 
> 1. I'm doing a POST to a HTTPS url
> 2. Some of my headers containg iso-8859-1 data
> 3. The body is sent with transfer-encoding: chunked
> 4. the "is_utf8" bit was set on the data (although it happens to be
>    all in code points < 256).
> 
> (changing *any* of these conditions makes the bug go away).
> 
> The request headers get corrupted, sent in utf-8 instead of iso-8859-1
> 
> some of the data doesn't get sent, messing up the chunked counts, or
> even trashing the request headers.
> 
> The number of missing bytes seems related to the difference in length
> between the iso-8859-1 headers and the incorrect utf-8 versions.

Interesting. I can reproduce this on (mostly current) sid with
libwww-perl 6.06-1.

> Here's my test program:

[...]
> # Bug only happens if https
> my $req = HTTP::Request->new (POST => 'https://localhost:4433');
> 
> # Bug only happens if utf8 bit is set on data to be written
> my $body = substr ("\x{f00f}\xae", 1, 1);
> 
> print "utf8 bit set\n" if utf8::is_utf8($body);
> 
> # Bug only happens with chunked content
> my $read_body = sub {
> 	my $buf = $body;
> 	$body = "";
> 	$buf
> };
> 
> $req->content ($read_body);

Quoting HTTP::Request documentation:

     $r->content( $bytes )
           This is used to get/set the content and it is inherited from
           the "HTTP::Message" base class.  See HTTP::Message for details
           and other methods that can be used to access the content.

           Note that the content should be a string of bytes.  Strings in
           perl can contain characters outside the range of a byte.
           The "Encode" module can be used to turn such strings into a
           string of bytes.

So this is not totally unexpected, but the particular failure mode you've
run into is certainly rather horrible.

Possibly the content() method should croak when the UTF8 bit is set? 
(I suppose it can't encode the string automatically as it doesn't know
which encoding should be used.)
-- 
Niko Tyni   ntyni at debian.org



More information about the pkg-perl-maintainers mailing list