Bug#1020574: perl-doc: encoding issue / spelling mistake with "perldoc perlfaq4"

Russ Allbery rra at debian.org
Mon Sep 26 20:56:40 BST 2022


Vincent Lefevre <vincent at vinc17.net> writes:

> "perldoc perlfaq4" gives in UTF-8 locales

> [...]
>     The trick to this problem is avoiding accidental autovivification. If
>     you want to check three keys deep, you might na<EF>vely try this:

> where <EF> is actually the EF byte as shown by the "less" pager.

> This should be encoded in UTF-8. However, this is a spelling mistake:
> contrary to French, there is no ï in English (at least, my dictionaries
> cannot find such a variant): naively.

Okay, I think I found a working solution for this.

The root of the problem was that Pod::Text's default position on encoding
is to copy the encoding of the input file to the output (in part for
historical reasons).  I had assumed that by the first time it saw a
non-ASCII character, Pod::Simple would have decided on an encoding (since
otherwise where did the character come from).

What I hadn't thought about was the case where that character comes from
an E<> escape.  In this case, Pod::Simple still has no detected encoding,
and therefore Pod::Text printed the character without encoding at all.
For various historical reasons I think that results in characters in the
ISO 8859-1 range being printed as single bytes.

The solution I've implemented for the upcoming 5.00 podlators release is
to teach Pod::Text that once it sees a non-ASCII character, it has to
commit to an output encoding.  In the absence of any other information, it
commits to UTF-8, since that's the most likely to work these days.

It would arguably be better for it to commit to the locale's native
encoding (well, unless it's ASCII because the user doesn't have a locale
or has the C locale set).  Unfortunately, this requires Encode::Locale,
and Pod::Text is a core module, so I didn't want to add a dependency.
After thinking about this for a bit, I decided to just document this and
leave it up to the caller, who will be able to load Encode::Locale and
pass in an explicitly encoding of locale to Pod::Text if it wants to.
That's something for perldoc to consider (although it has the same core
module problem).

podlators 5.00 will be a major release with breaking changes, since I'm
finally going to change the default Pod::Man output encoding to UTF-8 and
get rid of a lot of the transformations that it tries to do for troff
output that have caused a lot of headaches.  So it may be a bit longer
than normal for it to propagate into a Perl release and then into Debian.

-- 
Russ Allbery (rra at debian.org)              <https://www.eyrie.org/~eagle/>




More information about the Perl-maintainers mailing list