Bug#864782: perl: Regexp matching crashes claiming string is malformed Utf8, despite it is valid.

Niko Tyni ntyni at debian.org
Wed Jun 14 18:28:44 UTC 2017


Control: tag -1 confirmed

On Wed, Jun 14, 2017 at 07:16:35PM +0200, Benjamin Bayart wrote:
> Package: perl
> Version: 5.24.1-3
> Severity: normal
> Tags: upstream

> In some cases, some valid utf-8 chinese (or japanese Kanji) chars
> in a perl string makes perl die on "Malformed UTF-8" while matching
> a regexp.
> 
> Here is the smallest programm (all in ascii, for safety) creating
> the problem.
 
Thanks for the report and the test case.

Running this with debugperl under valgrind shows invalid memory
accesses, log below.

It also happens with 5.26.0, but indeed not with the jessie 5.20
perl.

I got it down to a somewhat simpler form

--------------------
  #!/usr/bin/perl
  
  use strict;
  use warnings;
  
  my $text = "%t%\x{6bce}";
  
  $text =~ s{~*%[a-z]%}{}g;
  print "Works, for now\n";
--------------------

which still crashes here and shows similar valgrind errors.

I'll try to bisect this and forward upstream.


==15091== Memcheck, a memory error detector
==15091== Copyright (C) 2002-2015, and GNU GPL'd, by Julian Seward et al.
==15091== Using Valgrind-3.12.0 and LibVEX; rerun with -h for copyright info
==15091== Command: debugperl 864782.pl
==15091== 
==15091== Invalid read of size 1
==15091==    at 0x4C30027: memchr (vg_replace_strmem.c:883)
==15091==    by 0x20795B: Perl_fbm_instr (util.c:828)
==15091==    by 0x311B9C: Perl_re_intuit_start (regexec.c:907)
==15091==    by 0x314DFF: Perl_regexec_flags (regexec.c:2982)
==15091==    by 0x2BA4D0: Perl_pp_substcont (pp_ctl.c:225)
==15091==    by 0x206AD9: Perl_runops_debug (dump.c:2239)
==15091==    by 0x16D962: S_run_body (perl.c:2488)
==15091==    by 0x16D962: perl_run (perl.c:2411)
==15091==    by 0x136408: main (perlmain.c:116)
==15091==  Address 0x5c5f48b is 0 bytes after a block of size 59 alloc'd
==15091==    at 0x4C2BBAF: malloc (vg_replace_malloc.c:299)
==15091==    by 0x208FB2: Perl_safesysmalloc (util.c:153)
==15091==    by 0x260557: Perl_sv_grow (sv.c:1605)
==15091==    by 0x26EB55: Perl_sv_setpvn (sv.c:4896)
==15091==    by 0x26F0B8: Perl_sv_copypv_flags (sv.c:3233)
==15091==    by 0x234811: Perl_pp_stringify (pp_hot.c:89)
==15091==    by 0x206AD9: Perl_runops_debug (dump.c:2239)
==15091==    by 0x142850: S_fold_constants (op.c:4381)
==15091==    by 0x1B47A3: Perl_yyparse (perly.y:711)
==15091==    by 0x16BA2A: S_parse_body (perl.c:2336)
==15091==    by 0x16BA2A: perl_parse (perl.c:1650)
==15091==    by 0x136362: main (perlmain.c:114)
==15091== 
==15091== Invalid read of size 1
==15091==    at 0x2FB0D1: S_reginclass (regexec.c:9038)
==15091==    by 0x30BB9C: S_find_byclass (regexec.c:1869)
==15091==    by 0x312806: Perl_re_intuit_start (regexec.c:1293)
==15091==    by 0x314DFF: Perl_regexec_flags (regexec.c:2982)
==15091==    by 0x2BA4D0: Perl_pp_substcont (pp_ctl.c:225)
==15091==    by 0x206AD9: Perl_runops_debug (dump.c:2239)
==15091==    by 0x16D962: S_run_body (perl.c:2488)
==15091==    by 0x16D962: perl_run (perl.c:2411)
==15091==    by 0x136408: main (perlmain.c:116)
==15091==  Address 0x5c5f48b is 0 bytes after a block of size 59 alloc'd
==15091==    at 0x4C2BBAF: malloc (vg_replace_malloc.c:299)
==15091==    by 0x208FB2: Perl_safesysmalloc (util.c:153)
==15091==    by 0x260557: Perl_sv_grow (sv.c:1605)
==15091==    by 0x26EB55: Perl_sv_setpvn (sv.c:4896)
==15091==    by 0x26F0B8: Perl_sv_copypv_flags (sv.c:3233)
==15091==    by 0x234811: Perl_pp_stringify (pp_hot.c:89)
==15091==    by 0x206AD9: Perl_runops_debug (dump.c:2239)
==15091==    by 0x142850: S_fold_constants (op.c:4381)
==15091==    by 0x1B47A3: Perl_yyparse (perly.y:711)
==15091==    by 0x16BA2A: S_parse_body (perl.c:2336)
==15091==    by 0x16BA2A: perl_parse (perl.c:1650)
==15091==    by 0x136362: main (perlmain.c:114)
==15091== 
==15091== Invalid read of size 1
==15091==    at 0x30BB67: S_find_byclass (regexec.c:1869)
==15091==    by 0x312806: Perl_re_intuit_start (regexec.c:1293)
==15091==    by 0x314DFF: Perl_regexec_flags (regexec.c:2982)
==15091==    by 0x2BA4D0: Perl_pp_substcont (pp_ctl.c:225)
==15091==    by 0x206AD9: Perl_runops_debug (dump.c:2239)
==15091==    by 0x16D962: S_run_body (perl.c:2488)
==15091==    by 0x16D962: perl_run (perl.c:2411)
==15091==    by 0x136408: main (perlmain.c:116)
==15091==  Address 0x5c5f48b is 0 bytes after a block of size 59 alloc'd
==15091==    at 0x4C2BBAF: malloc (vg_replace_malloc.c:299)
==15091==    by 0x208FB2: Perl_safesysmalloc (util.c:153)
==15091==    by 0x260557: Perl_sv_grow (sv.c:1605)
==15091==    by 0x26EB55: Perl_sv_setpvn (sv.c:4896)
==15091==    by 0x26F0B8: Perl_sv_copypv_flags (sv.c:3233)
==15091==    by 0x234811: Perl_pp_stringify (pp_hot.c:89)
==15091==    by 0x206AD9: Perl_runops_debug (dump.c:2239)
==15091==    by 0x142850: S_fold_constants (op.c:4381)
==15091==    by 0x1B47A3: Perl_yyparse (perly.y:711)
==15091==    by 0x16BA2A: S_parse_body (perl.c:2336)
==15091==    by 0x16BA2A: perl_parse (perl.c:1650)
==15091==    by 0x136362: main (perlmain.c:114)
==15091== 
==15091== Invalid read of size 1
==15091==    at 0x31960D: Perl_utf8n_to_uvchr (utf8.c:558)
==15091==    by 0x2FB1EF: S_reginclass (regexec.c:9046)
==15091==    by 0x30BB9C: S_find_byclass (regexec.c:1869)
==15091==    by 0x312806: Perl_re_intuit_start (regexec.c:1293)
==15091==    by 0x314DFF: Perl_regexec_flags (regexec.c:2982)
==15091==    by 0x2BA4D0: Perl_pp_substcont (pp_ctl.c:225)
==15091==    by 0x206AD9: Perl_runops_debug (dump.c:2239)
==15091==    by 0x16D962: S_run_body (perl.c:2488)
==15091==    by 0x16D962: perl_run (perl.c:2411)
==15091==    by 0x136408: main (perlmain.c:116)
==15091==  Address 0x5c5f4a8 is 24 bytes after a block of size 64 in arena "client"
==15091== 
Failed Malformed UTF-8 character (fatal) at 864782.pl line 8.
==15091== 
==15091== HEAP SUMMARY:
==15091==     in use at exit: 125,719 bytes in 242 blocks
==15091==   total heap usage: 4,969 allocs, 4,727 frees, 917,501 bytes allocated
==15091== 
==15091== LEAK SUMMARY:
==15091==    definitely lost: 8,192 bytes in 16 blocks
==15091==    indirectly lost: 117,527 bytes in 226 blocks
==15091==      possibly lost: 0 bytes in 0 blocks
==15091==    still reachable: 0 bytes in 0 blocks
==15091==         suppressed: 0 bytes in 0 blocks
==15091== Rerun with --leak-check=full to see details of leaked memory
==15091== 
==15091== For counts of detected and suppressed errors, rerun with: -v
==15091== ERROR SUMMARY: 304 errors from 4 contexts (suppressed: 0 from 0)

-- 
Niko Tyni   ntyni at debian.org




More information about the Perl-maintainers mailing list