Bug#529305: \w doesn't match c-cedilla, o-diaeresis and u-diaeresis under tr_TR.utf8 and de_DE.utf8 locales

Damyan Ivanov dmn at debian.org
Mon May 18 14:52:09 UTC 2009


Package: perl
Version: 5.10.0-22
Severity: normal


Showcase:
(requires installing tr_TR.utf8 and de_De.utf8 locales via 'dpkg-reconfigure
locales' or installing locales-all package)

 #/usr/bin/perl
 use strict;
 use warnings;
 use POSIX qw(setlocale LC_ALL);
 setlocale(LC_ALL, "tr_TR.utf8");
 print "Locale is ", setlocale(LC_ALL), "\n";

 use locale;
 use utf8;
 binmode STDOUT, ":utf8";

 print "$_ is " . ( /\w/ ? "" : "not " ) . "a word character\n"
    for qw( ç ö ş ü ğ ı İ );

The output is

 Locale is tr_TR.utf8
 ç is not a word character
 ö is not a word character
 ş is a word character
 ü is not a word character
 ğ is a word character
 ı is a word character
 İ is a word character

Looking (with my uneducated eyes) in /usr/share/i18n/locales/tr_TR it seems
that at least c-cedilla (U00E7 in small caps and U00C7 in caps) shall be
treated as an "alpha" character so the problem seems to be in perl's
interpretation.

Replacing tr_TR with de_DE in the script gives the same results and at least ö
and ü are definitely German :)

Problem reproducible also on Lenny's perl 5.10.0-19 and Etch's 5.8.8-7etch6.


Thanks for your time.

        dam


-- System Information:
Debian Release: squeeze/sid
  APT prefers unstable
  APT policy: (500, 'unstable'), (500, 'stable'), (450, 'experimental')
Architecture: amd64 (x86_64)

Kernel: Linux 2.6.29-2-amd64 (SMP w/4 CPU cores)
Locale: LANG=bg_BG.UTF-8, LC_CTYPE=bg_BG.UTF-8 (charmap=UTF-8)
Shell: /bin/sh linked to /bin/dash

Versions of packages perl depends on:
ii  libc6                  2.9-12            GNU C Library: Shared libraries
ii  libdb4.6               4.6.21-13         Berkeley v4.6 Database Libraries [
ii  libgdbm3               1.8.3-4           GNU dbm database routines (runtime
ii  perl-base              5.10.0-22         minimal Perl system
ii  perl-modules           5.10.0-22         Core Perl modules
ii  zlib1g                 1:1.2.3.3.dfsg-13 compression library - runtime

Versions of packages perl recommends:
ii  make                          3.81-5     The GNU version of the "make" util
ii  netbase                       4.34       Basic TCP/IP networking system

Versions of packages perl suggests:
ii  libterm-readline-gnu-perl     1.19-1     Perl extension for the GNU Readlin
ii  libterm-readline-perl-perl    1.0302-1   Perl implementation of Readline li
ii  perl-doc                      5.10.0-22  Perl documentation

-- no debconf information






More information about the Perl-maintainers mailing list