Bug#466341: Some ISO-2022-JP text cannot be roundtripped

Niko Tyni ntyni at debian.org
Sat Apr 4 17:39:40 UTC 2009


found 466341 5.10.0-19
retitle 466341 support the Encode::decode CHECK argument with ISO-2022-JP
severity 466341 wishlist
thanks

On Mon, Feb 18, 2008 at 01:36:55AM -0500, Bryan Donlan wrote:
> Package: perl
> Version: 5.8.8-12
> Severity: normal
> 
> Converting a certain sequence of ISO-2022-JP text to utf8 succeeds:
> $  perl -MEncode -e '$s= "{\x1b\x24\x42\x2d)\x1b(B}"; print
> encode("utf8", decode("iso-2022-jp", $s, Encode::FB_CROAK)), "\n"'
> {⑨}
> 
> However, converting it back to ISO-2022-JP fails:
> $ perl -MEncode -e '$s= "{\x1b\x24\x42\x2d)\x1b(B}"; print
> encode("iso-2022-jp", decode("iso-2022-jp", $s, Encode::FB_CROAK)),
> "\n"'
> {\x{2468}}
> 
> It should be noted that iconv rejects this entirely:
> $ perl -MEncode -e '$s= "{\x1b\x24\x42\x2d)\x1b(B}"; print $s,
> "\n"'|iconv -f iso-2022-jp -t utf8
> {iconv: illegal input sequence at position 4
> 
> However, if this is truly invalid iso-2022-jp, perl should croak on it, since
> FB_CROAK was passed.

It's indeed an invalid sequence, iconv is right about that. The original
JIS-C-6226 (aka. JIS X 0208) standard can be found at e.g. [1], and it
does not contain 0x2d 0x29, which is the sequence embedded in your 
iso-2022-jp coded example.

The bug here seems to be that the corresponding Encode module ignores
the CHECK argument. The Encode documentation states:

 NOTE: Not all encoding support this feature
  Some encodings ignore CHECK argument.  For example, Encode::Unicode ignores CHECK and it
  always croaks on error.

so lowering the severity.

[1] http://www.itscj.ipsj.or.jp/ISO-IR/087.pdf

Cheers,
-- 
Niko Tyni   ntyni at debian.org






More information about the Perl-maintainers mailing list