Bug#492037: Bug#500210: perldoc perlrun spits out junk in synopsis

Dominic Hargreaves dom at earth.li
Sat May 21 14:56:16 UTC 2011


On Wed, Oct 01, 2008 at 02:10:53AM -0700, Russ Allbery wrote:
> Niko Tyni <ntyni at debian.org> writes:
> 
> > Any estimate on how widespread this POD problem is? Is the hardcoded
> > 'pod2man --utf8' in the Lenny perldoc going to cause more grief than
> > it's worth?
> >
> > I'm leaning on reverting that and reopening #492037 until the issue is
> > sorted out in Pod-Perldoc upstream. Adding a way to enable or disable
> > the '--utf8' option on the perldoc command line is one possibility,
> > but it might as well cause even further trouble if upstream chooses a
> > different implementation.
> 
> I looked at this some more, and there's a deeper problem.  If you run the
> current pod2man with --utf8 on an input POD file that doesn't declare an
> =encoding of UTF-8, any use of S<> in that POD file will result in invalid
> UTF-8, even if there's no use of high-bit characters in the input POD at
> all.
> 
> I think the core problem was that Pod::Man is responsible for the output
> through the file handle and was missing an encoding layer.  The problem is
> that we can't just call encode() on the output, since that breaks if
> PERL_UNICODE is set or if an encoding was manually set on the file handle.
> You get double-encoding.  I think the least bad option is for Pod::Man and
> Pod::Text to force the encoding on their output file handles to UTF-8 when
> --utf8 is given.
> 
> The problem with this fix is that this now really will break pod2man
> --utf8 if POD documents don't have their encoding declared properly, since
> it will end up double-encoding the UTF-8 given that, without =encoding,
> Pod::Simple is treating the input as ISO 8859-15.  I think it's correct
> according to the specifications, but existing POD text that doesn't
> declare an encoding will get double-encoded output.  I can work around
> this by not setting a UTF-8 output encoding unless the input encoding is
> detected as UTF-8, but that's not really correct.  You *should* be able to
> have an input POD document with =encoding ISO-8859-1 and run it through
> pod2man --utf8 and get UTF-8 output.  But a POD document with no
> =encoding according to perlpodspec has an implicit =encoding ISO-8859-1.
> 
> Pod::Text has an additional challenge.  pod2man won't produce any
> non-ASCII characters without --utf8 and has been that way since the
> beginning of the Pod::Simple implementation.  pod2text, on the other hand,
> always passed through whatever it got.  I could just leave it alone, but
> if you feed the current pod2text a document that *does* have =encoding
> UTF-8 in it, you get Perl warnings about wide characters on output.  I
> think the best solution here is to force the output file handle to have an
> encoding matching what Pod::Simple believes the input encoding is.  This
> comes the closest to preserving the traditional pass-through behavior.
> 
> I think that for lenny you may want to back out of the --utf8 change and
> give it some time to settle.

[the --utf8 change being the change to have perldoc run pod2html with
the --utf8 option by default].

I've spent a bit of time reading through #492037 (this bug) and
#480997 (which was resolved) trying to figure out how to progress
this issue.

As far as I can see, pod2man --utf8 now exists, but will not render
all documents correctly - possibly =encoding UTF8 is needed for this
to work.

Is this statement still true, or has any progress happened since the
last message on this bug which I've missed?

Note that <http://rt.cpan.org/Public/Bug/Display.html?id=39000>
still has the patch from Niko with no further comment, so once we
understand the current situation it would probably make sense to
comment on that bug, to avoid anyone taking that and repeating work.

Dominic.

-- 
Dominic Hargreaves | http://www.larted.org.uk/~dom/
PGP key 5178E2A5 from the.earth.li (keyserver,web,email)






More information about the Perl-maintainers mailing list