Bug#880798: Wide character in print at /usr/bin/json_pp line 82

Dominic Hargreaves dom at earth.li
Fri Nov 17 15:03:45 UTC 2017


Control: forwarded -1 https://rt.cpan.org/Ticket/Display.html?id=123653

On Sun, Nov 05, 2017 at 04:31:19AM +0800, 積丹尼 Dan Jacobson wrote:
> X-Debbugs-Cc: makamaka at cpan.org
> Package: perl
> Version: 5.26.1-2
> File: /usr/bin/json_pp
> 
> This command line utility should have all character set issues already
> solved internally, no?
> 
> $ set http://radioscanningtw.jidanni.org/index.php?title=%E9%A6%96%E9%A0%81
> $ GET http://archive.org/wayback/available?url=$@
> {"url": "http://radioscanningtw.jidanni.org/index.php?title=\u9996\u9801", "archived_snapshots": {"closest": {"status": "200", "available": true, "url": "http://web.archive.org/web/20171104183618/http://radioscanningtw.jidanni.org/index.php?title=%E9%A6%96%E9%A0%81", "timestamp": "20171104183618"}}}
> 
> $ GET http://archive.org/wayback/available?url=$@ | json_pp
> Wide character in print at /usr/bin/json_pp line 82, <STDIN> chunk 1.

It looks like this is working as advertised. From json_pp(1): 

"  -json_opt
    options to JSON::PP

    Acceptable options are:

        ascii latin1 utf8 pretty indent space_before space_after relaxed canonical allow_nonref
        allow_singlequote allow_barekey allow_bignum loose escape_slash
"

>From JSON::PP(3perl):

"   utf8
           $json = $json->utf8([$enable])

           $enabled = $json->get_utf8

       If $enable is true (or missing), then the encode method will encode the
       JSON result into UTF-8, as required by many protocols, while the decode
       method expects to be handled an UTF-8-encoded string. Please note that
       UTF-8-encoded strings do not contain any characters outside the range
       0..255, they are thus useful for bytewise/binary I/O.

       (In Perl 5.005, any character outside the range 0..255 does not exist.
       See to "UNICODE HANDLING ON PERLS".)

       In future versions, enabling this option might enable autodetection of
       the UTF-16 and UTF-32 encoding families, as described in RFC4627.

       If $enable is false, then the encode method will return the JSON string
       as a (non-encoded) Unicode string, while decode expects thus a Unicode
       string. Any decoding or encoding (e.g. to UTF-8 or UTF-16) needs to be
       done yourself, e.g. using the Encode module.
"

I do agree that the requirement to supply that flag is not intuitive,
although I'm not sure whether this is easily fixable. For some output
formats I can see that it would not make sense to always pass the utf8
flag up (for example the second example in the json_pp manpage) but
perhaps it could be a bit clever for situations where it ends up
printing utf8 characters to the terminal.

I've forwarded this upstream to see whether it is practical to make
this more user friendly.

Dominic.




More information about the Perl-maintainers mailing list