[Po4a-devel]HTML translating

Yves Rutschle debian.anti-spam@rutschle.net
Wed, 10 Nov 2004 17:36:29 +0000


On Mon, Nov 08, 2004 at 02:22:42PM +0100, Martin Quinson wrote:
> Ok. I wanted to reply this message the way it desserve (with a long
> argumentation to base my point)

Thank you for sharing your experience; I'm getting convinced
now.

> If you think that such issues are seldom and dealable with, type 
> man Locale::Maketext::TPJ13 in a terminal ;)

I read that article a long time ago, printed in a real paper
version of TPJ... I think that's actually the most single
interesting article I read in all TPJs :)

[splitting in HTML blocs]
> > That's actually fairly easily achievable: the list of
> > paragraph-marking tags is fairly small (<p>, <div>,
> > <h1,2,3,4,...>) and XHTML makes it mandatory for text to be
> > included in a block-level element of some sort.
> 
> You thus have to show some formating tags to the translators. We do so in
> all other modules. I don't see any better idea.

Ok. Well, I'm afraid that means I'm gonna have to ditch the
current Html.pm and redo one from scratch (bar a couple of
routines that may be recued).

So, we'll now be cutting the html along blocks and display
formatting tags inline (at first sight, it looks like
cutting along tags that have a 'display: block' property,
while keeping those that have a 'display: inline' property).

While thinking about it there is at least one thing I'd like
feedback on: I'd personally rather not expose "complicated"
tags to the translator, i.e. while I think it's acceptable
to present them with <b> and <i> and so on, I don't think
something like:

This is a <a
href="blahblah.com/this/that/blah.html">link</a> to <img src="blahblah.com/this/that/blah.png" alt="blah" title="Blah">

belongs in a PO.

So I'd propose to collapse the inside of long inline tags,
so as to simply state there is a tag (e.g. "you're in a
link") without detailing what the tag contains. Thus, the
example line would appear, in the PO, as:

This is a <a>link</a> to <img>blah</img>

(Meanwhile we also output the title field of the img as a
separate msgid; the alt field is a replacement for the image
for text browsers, and therefore belongs interpolated in the
rest of the text).

One argument to expose the full tag would be that it allows
the translator to update links (change a link to blah.html
into a link to blah.fr.html for example), allowing the
complete translation of a Web site. I'm not fond of the idea
though, as:
- The tranlator doesn't necessarily know how the translation
  would be implemented
- The burden of maintaining the Web site should not be on
  the translator
- A small script can easily take care of that (I'll be happy
  to provide what I've written later on, but I'm not sure it
  belongs in po4a).

Any comments on this?

[HTML::Parser vs Jordi's XML parser]
> Moreover, I'd be pleased to cut a dependency. I hate unjustified
> dependencies, but it may be personal.

Me too, but I hate reimplementation of code (reinventing the
wheel) more. Besides HTML::Parser is also quite widely
spread, and only one apt-get away at worse (or emerge, or
whatever -- if it's not one command away, you need a better
distribution :) )


Y.