Bug#745823: libwww-perl: an https request with iso-8859-1 headers, chunked transfer and data with utf8 bit on is corrupted.

John Hughes john at calva.com
Fri Apr 25 15:10:36 UTC 2014


Package: libwww-perl
Version: 5.836-1
Severity: normal

This was horrible to narrow down, but:

1. I'm doing a POST to a HTTPS url
2. Some of my headers containg iso-8859-1 data
3. The body is sent with transfer-encoding: chunked
4. the "is_utf8" bit was set on the data (although it happens to be
   all in code points < 256).

(changing *any* of these conditions makes the bug go away).

The request headers get corrupted, sent in utf-8 instead of iso-8859-1

some of the data doesn't get sent, messing up the chunked counts, or
even trashing the request headers.

The number of missing bytes seems related to the difference in length
between the iso-8859-1 headers and the incorrect utf-8 versions.

For example my request should look like:

----
POST / HTTP/1.1
TE: deflate,gzip;q=0.3
Connection: TE, close
Host: localhost:4433
User-Agent: LWP UTF8 BUG
Subject: ®®®®®®®®®®®®
Transfer-Encoding: chunked

1
®
0

----

But it is sent as:

----
POST / HTTP/1.1
TE: deflate,gzip;q=0.3
Connection: TE, close
Host: localhost:4433
User-Agent: LWP UTF8 BUG
Subject: ®®®®®®®®®®®®
Transfer-Encoding: chunk0

----

Here's my test program:

----
#! /usr/bin/perl

use strict;
use LWP::UserAgent;

my $agent = LWP::UserAgent->new (agent => 'LWP UTF8 BUG');

# Bug only happens if https
my $req = HTTP::Request->new (POST => 'https://localhost:4433');

# Bug only happens if utf8 bit is set on data to be written
my $body = substr ("\x{f00f}\xae", 1, 1);

print "utf8 bit set\n" if utf8::is_utf8($body);

# Bug only happens with chunked content
my $read_body = sub {
	my $buf = $body;
	$body = "";
	$buf
};

$req->content ($read_body);

# Bug only happens if header with iso-8859-1 data
$req->header (Subject => "\xae" x 12);

my $ret = $agent->request ($req);

# Request sent is malformed - iso-8859-1 data sent as utf-8 and
# bytes missing from output (number of bytes missing equal to
# difference in length between iso-8859-1 and utf-8 representations.
---



-- System Information:
Debian Release: 6.0.7
  APT prefers oldstable
  APT policy: (500, 'oldstable')
Architecture: amd64 (x86_64)

Kernel: Linux 2.6.32-5-amd64 (SMP w/1 CPU core)
Locale: LANG=en_US.utf8, LC_CTYPE=en_US.utf8 (charmap=UTF-8)
Shell: /bin/sh linked to /bin/dash

Versions of packages libwww-perl depends on:
ii  libhtml-parser-perl    3.66-1            collection of modules that parse H
ii  libhtml-tagset-perl    3.20-2            Data tables pertaining to HTML
ii  libhtml-tree-perl      3.23-2            Perl module to represent and creat
ii  liburi-perl            1.54-2            module to manipulate and access UR
ii  netbase                4.45              Basic TCP/IP networking system
ii  perl                   5.10.1-17squeeze6 Larry Wall's Practical Extraction 

Versions of packages libwww-perl recommends:
ii  libhtml-format-perl    2.04-2            format HTML syntax trees into text
ii  libio-compress-perl    2.024-1           bundle of IO::Compress modules
ii  libmailtools-perl      2.06-1            Manipulate email in perl programs
ii  perl [libio-compress-p 5.10.1-17squeeze6 Larry Wall's Practical Extraction 

Versions of packages libwww-perl suggests:
ii  libcrypt-ssleay-perl     0.57-2          Support for https protocol in LWP
ii  libio-socket-ssl-perl    1.33-1+squeeze1 Perl module implementing object or

-- no debconf information



More information about the pkg-perl-maintainers mailing list