Bug#529305: \w doesn't match c-cedilla, o-diaeresis and u-diaeresis under tr_TR.utf8 and de_DE.utf8 locales
Damyan Ivanov
dmn at debian.org
Mon May 18 14:52:09 UTC 2009
Package: perl
Version: 5.10.0-22
Severity: normal
Showcase:
(requires installing tr_TR.utf8 and de_De.utf8 locales via 'dpkg-reconfigure
locales' or installing locales-all package)
#/usr/bin/perl
use strict;
use warnings;
use POSIX qw(setlocale LC_ALL);
setlocale(LC_ALL, "tr_TR.utf8");
print "Locale is ", setlocale(LC_ALL), "\n";
use locale;
use utf8;
binmode STDOUT, ":utf8";
print "$_ is " . ( /\w/ ? "" : "not " ) . "a word character\n"
for qw( ç ö ş ü ğ ı İ );
The output is
Locale is tr_TR.utf8
ç is not a word character
ö is not a word character
ş is a word character
ü is not a word character
ğ is a word character
ı is a word character
İ is a word character
Looking (with my uneducated eyes) in /usr/share/i18n/locales/tr_TR it seems
that at least c-cedilla (U00E7 in small caps and U00C7 in caps) shall be
treated as an "alpha" character so the problem seems to be in perl's
interpretation.
Replacing tr_TR with de_DE in the script gives the same results and at least ö
and ü are definitely German :)
Problem reproducible also on Lenny's perl 5.10.0-19 and Etch's 5.8.8-7etch6.
Thanks for your time.
dam
-- System Information:
Debian Release: squeeze/sid
APT prefers unstable
APT policy: (500, 'unstable'), (500, 'stable'), (450, 'experimental')
Architecture: amd64 (x86_64)
Kernel: Linux 2.6.29-2-amd64 (SMP w/4 CPU cores)
Locale: LANG=bg_BG.UTF-8, LC_CTYPE=bg_BG.UTF-8 (charmap=UTF-8)
Shell: /bin/sh linked to /bin/dash
Versions of packages perl depends on:
ii libc6 2.9-12 GNU C Library: Shared libraries
ii libdb4.6 4.6.21-13 Berkeley v4.6 Database Libraries [
ii libgdbm3 1.8.3-4 GNU dbm database routines (runtime
ii perl-base 5.10.0-22 minimal Perl system
ii perl-modules 5.10.0-22 Core Perl modules
ii zlib1g 1:1.2.3.3.dfsg-13 compression library - runtime
Versions of packages perl recommends:
ii make 3.81-5 The GNU version of the "make" util
ii netbase 4.34 Basic TCP/IP networking system
Versions of packages perl suggests:
ii libterm-readline-gnu-perl 1.19-1 Perl extension for the GNU Readlin
ii libterm-readline-perl-perl 1.0302-1 Perl implementation of Readline li
ii perl-doc 5.10.0-22 Perl documentation
-- no debconf information
More information about the Perl-maintainers
mailing list