Bug#492037: Bug#500210: perldoc perlrun spits out junk in synopsis

Niko Tyni ntyni at debian.org
Sun May 22 06:01:23 UTC 2011


On Sat, May 21, 2011 at 03:56:16PM +0100, Dominic Hargreaves wrote:

> As far as I can see, pod2man --utf8 now exists, but will not render
> all documents correctly - possibly =encoding UTF8 is needed for this
> to work.
> 
> Is this statement still true, or has any progress happened since the
> last message on this bug which I've missed?

It's clearly still true, and I can't see any fix for it other than adding
=encoding utf8 lines in the POD files where necessary.

However, I think all the documents that are rendered incorrectly with
--utf8 are already rendered incorrectly now, albeit in a different
way. See below.

Incorrect (double encoded) output with a missing =encoding utf8:

 perl -CO -Mcharnames=:full -E 'say qq(=head1 \N{LATIN SMALL LETTER A WITH DIAERESIS}\n)' | pod2man --utf8 | grep '^\.SH'
 .SH "ä"

Correct output:

 perl -CO -Mcharnames=:full -E 'say qq(=encoding utf8\n\n=head1 \N{LATIN SMALL LETTER A WITH DIAERESIS}\n)' | pod2man --utf8 | grep '^\.SH'                        
 .SH "ä"

Current behaviour for UTF-8 with a missing =encoding utf8 is just as broken:

 perl -CO -Mcharnames=:full -E 'say qq(=head1 \N{LATIN SMALL LETTER A WITH DIAERESIS}\n)' | pod2man | grep '^\.SH'
 .SH "A\*~X"

and pure latin1 without an =encoding works with both of course:

 perl -Mcharnames=:full -E 'say qq(=head1 \N{LATIN SMALL LETTER A WITH DIAERESIS}\n)' | pod2man | grep '^\.SH'
 .SH "a\*:"

 perl -Mcharnames=:full -E 'say qq(=head1 \N{LATIN SMALL LETTER A WITH DIAERESIS}\n)' | pod2man --utf8 | grep '^\.SH'
 .SH "ä"
 
A quick check [1] on my system gives 26 files in /usr/share/perl5 that
use UTF-8 characters in the POD part but don't declare an =encoding
utf-8. All of them that I checked have broken manpages already (except
Spiffy.pm which has been fixed with a hack, see #441828.)

The proposed change of using --utf8 by default would just break these
in a different way AFAICS.

(This looks like something lintian could detect.)

[1]  find . -name '*.pm' -o -name '*.pod' | while read i; do if ! podselect $i | perl -ne '$e++ if /^=encoding/; exit 1 if /[\200-\377]/ && !$e' && iconv -f utf8 -t utf8 $i >/dev/null 2>&1; then echo $i; fi; done

> Note that <http://rt.cpan.org/Public/Bug/Display.html?id=39000>
> still has the patch from Niko with no further comment, so once we
> understand the current situation it would probably make sense to
> comment on that bug, to avoid anyone taking that and repeating work.

I see Porting/Maintainers.pl says blead is upstream for Pod-Perldoc,
so I seem to have filed the above ticket in a wrong place.
-- 
Niko Tyni   ntyni at debian.org






More information about the Perl-maintainers mailing list