Bug#521177: libwww-perl: LWP::UserAgent::request() fails with 'Wide character in syswrite' when posting UTF-8 encoded body

Ruzsa Balazs ruzsa.balazs at interware.co.hu
Wed Mar 25 13:53:25 UTC 2009


Package: libwww-perl
Version: 5.813-1
Severity: important


Here is what I tried to do:

------------cut------------
#!/usr/bin/perl

use strict;
use warnings;
use encoding 'iso-8859-2';

use Encode;
use LWP::UserAgent;
use HTTP::Request;

my $POST_URL = "http://somewhere.net/webservice.php";

my $xml = <<"EOT";
<?xml version="1.0" encoding="utf-8" ?>

<PACKET>
<TEXT>Árvíztûrõ tükörfúrógép</TEXT>
</PACKET>
EOT

my $ua = LWP::UserAgent->new();
my $request = HTTP::Request->new('POST', $POST_URL);
my $content = encode('utf-8', $xml);
$request->header('Content-Type' => 'text/xml; charset=utf-8');
$request->header('Content-Length' => length($content));
$request->content($content);
my $response = $ua->request($request);
------------cut------------

Here is what I get when Perl tries to execute the last line:

------------cut------------
failed: 500 Wide character in syswrite
Content-Type: text/plain
Client-Date: Wed, 25 Mar 2009 13:21:30 GMT
Client-Warning: Internal response

500 Wide character in syswrite
------------cut------------

The message in the <TEXT> tag is a test phrase containing all possible accented
characters in the Hungarian language. It is encoded as 'iso-8859-2' in the
source file.  Thanks to the 'use encoding' pragma this is converted to
character semantics (utf8 flag on) when Perl reads the source.

After some bughunting, I identified the source of the problem in
/usr/share/perl5/LWP/Protocol/http.pm:

202: my $req_buf = $socket->format_request($method, $fullpath, @h);
...
235: if ($has_content) {
...
249: my $buf = $req_buf . $$content_ref; # <--- HERE

If $$content_ref contains a byte-string (a string with byte semantics) and
$req_buf is a character-string (a string with character semantics) then upon
concatenation, $$content_ref will be converted to character semantics with the
default 'iso-8859-1' encoding (this conversion happens even if $req_buf
contains only ASCII characters). In my example, this means that Perl converts
my utf-8 encoded test phrase to a string that contains consecutive bytes of
utf-8 sequences masquerading as separate characters.

What I don't understand: LWP::UserAgent should be able to send the resulting -
"semantically" wrong, but "syntactically" right - string over the wire, as it
contains only characters with code points < 256. So I still don't understand
where those "wide characters" - which I assume to be characters with code
points >= 256 - are coming from.

Anyway, the problem can be resolved with the following lines added after line
#202:

    my $req_buf = $socket->format_request($method, $fullpath, @h);
    use Encode;
    if (Encode::is_utf8($req_buf)) {
      Encode::_utf8_off($req_buf);
    }

This simply makes sure that the buffer storing the HTTP headers does not have
the 'utf8' flag turned on. I can only hope that the $req_buf returned by
format_request does not contain non-ASCII characters (it shouldn't).

With this change, the concatenation above does not touch $$content_ref and the
request gets posted without errors.


-- System Information:
Debian Release: 5.0
  APT prefers stable
  APT policy: (500, 'stable')
Architecture: i386 (i686)

Kernel: Linux 2.6.28.7prana (PREEMPT)
Locale: LANG=en_US.UTF-8, LC_CTYPE=en_US.UTF-8 (charmap=UTF-8)
Shell: /bin/sh linked to /bin/bash

Versions of packages libwww-perl depends on:
ii  libhtml-parser-perl        3.56-1+b1     A collection of modules that parse
ii  libhtml-tagset-perl        3.20-2        Data tables pertaining to HTML
ii  libhtml-tree-perl          3.23-1        represent and create HTML syntax t
ii  liburi-perl                1.35.dfsg.1-1 Manipulates and accesses URI strin
ii  netbase                    4.34          Basic TCP/IP networking system
ii  perl [libdigest-md5-perl]  5.10.0-19     Larry Wall's Practical Extraction 

Versions of packages libwww-perl recommends:
ii  libcompress-zlib-perl         2.012-1    Perl module for creation and manip
pn  libhtml-format-perl           <none>     (no description available)
ii  libmailtools-perl             2.03-1     Manipulate email in perl programs

Versions of packages libwww-perl suggests:
ii  libio-socket-ssl-perl         1.16-1     Perl module implementing object or

-- no debconf information





More information about the pkg-perl-maintainers mailing list