Bug#853091: perl: Dying when matching simple regex: Malformed UTF-8 character fatal

Niko Tyni ntyni at debian.org
Sun Jan 29 18:24:25 UTC 2017


Control: found -1 5.24.1-1

On Sun, Jan 29, 2017 at 06:23:30PM +0100, Leszek Dubiel wrote:
> Package: perl
> Version: 5.20.2-3+deb8u6
> Severity: normal
> 
> This is stripped out program version that causes error: 
> 
> 	printf "\x41\x9c\x5a\x0a" | perl -CS -e '$_ = <>; /^(.*)$/ && print "($1)\n"; /[^#]*/;'
> 
> It displays: 
> 
> 	(A�Z)
> 	Malformed UTF-8 character (fatal) at -e line 1, <> line 1.
> 
> Locale is pl_PL.UTF-8 . 

This still happens with 5.24.1-1. It can be reduced to

 printf "\x9c\x5a" | perl -CI -ne '/[^#]*/'

The byte sequence is indeed invalid utf8 (as shown by iconv as well),
but you're explicitly telling Perl (with -CS) that it's getting utf8 on
stdin. This is a recipe for problems.

So I'm not sure if it's a bug at all. At most the failure should be
handled a bit more gracefully.
-- 
Niko Tyni   ntyni at debian.org




More information about the Perl-maintainers mailing list