Bug#864782: perl: Regexp matching crashes claiming string is malformed Utf8, despite it is valid.

gregor herrmann gregoa at debian.org
Wed Jun 14 18:54:17 UTC 2017


On Wed, 14 Jun 2017 19:16:35 +0200, Benjamin Bayart wrote:

> In some cases, some valid utf-8 chinese (or japanese Kanji) chars
> in a perl string makes perl die on "Malformed UTF-8" while matching
> a regexp.
> 
> Here is the smallest programm (all in ascii, for safety) creating
> the problem.

Now that's interesting. I ran the script in a loop on my laptop
(amd64, Debian unstable), and it didn't error out a single time in
over 100_000 runs.

OTOH, on one of my raspis (armhf-ish, Raspbian stretch), it didn't
even succeed a single time in a couple of tries, and always fails
with

Failed Malformed UTF-8 character (fatal) at crash.pl line 8.

And on a third machine, a remote server (amd64, Debian stretch), I
got the first pass only after over 400 failures.

All with perl 5.24.1-3. 

So whatever is going on here seems a bit undeterministic …

Cheers,
gregor

-- 
 .''`.  https://info.comodo.priv.at/ - Debian Developer https://www.debian.org
 : :' : OpenPGP fingerprint D1E1 316E 93A7 60A8 104D  85FA BB3A 6801 8649 AA06
 `. `'  Member of VIBE!AT & SPI, fellow of the Free Software Foundation Europe
   `-   NP: Various Artists: Black Velvet Band
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 963 bytes
Desc: Digital Signature
URL: <http://lists.alioth.debian.org/pipermail/perl-maintainers/attachments/20170614/d66c8fa8/attachment.sig>


More information about the Perl-maintainers mailing list