[Po4a-devel] [po4a-Bugs][300622] automatic classification of docbook tags

po4a-bugs at alioth.debian.org po4a-bugs at alioth.debian.org
Thu Jul 29 13:14:29 UTC 2010


Bugs item #300622, was changed at 2004-04-06 23:07 by Denis Barbier
You can respond by visiting: 
https://alioth.debian.org/tracker/?func=detail&atid=410622&aid=300622&group_id=30267

>Status: Closed
Priority: 3
Submitted By: Martin Quinson (mquinson-guest)
Assigned to: Nobody (None)
Summary: automatic classification of docbook tags 
Category: None
Group: None
>Resolution: Fixed


Initial Comment:
[This was #6454 from Savannah]

*** fgouget repported ***
Here's the problem: the 'command' DocBook tag can be used either inline, or in a tag that does not contain text, e.g. 'cmdsynopsis'. Because of the latter case I had put 'command' in the 'translate' category. But this causes 'The <command>ls</> command prints the directory contents.' to generate:

msgid "The"
msgstr ""

msgid "ls"
msgstr ""

msgid "command prints the directory contents."
msgstr ""

The above is too hashed out to be translated. I can fix this by moving the 'command' tag to the 'ignore' category. And it still works if I put it into a 'cmdsynopsis' tag. But then we rely on 'cmdsynopsis' being in the 'indent' category.

Is this the right thing to do?
How does one determine in which category to put tags? 

**** mquinson answered ****
Yes, this is bad. I did move command to ignore, and cmdsynopsis is already in the indent list. I'll release 0.15.4 soon.

I'd really like to know how to determine automatically in which category fall each tag. That would allow me to write an automatic dtd parser determining this, and getting rid of the manual determination.

By the way, it would allow the use of this module with all existing dtd for free. For example, HTML could be translated that way, which is not the case for now.

I have no idea about that right now, and too few time to investigate the issue. If you want to do this, I would be more than pleased to help you if I can, and integrate it when it works. If you don't want, we have to fix the list manually together.

I'll do as many releases as needed to get that list right if you choose the second option.

*** fgouget answered ***
I'm willing to go for the second route (using the Wine doc as the testbed) and maybe determine an algorithm for classifying the tags (I'll leave the implementation as an exercise<g>). But could you check, complete and correct the following categories description?

* empty
Tags that cannot contain text or other tags.

* verbatim
Tags in which the text layout, spaces and new-lines, is important. Such text should not be reformated before being put in po format.

* translate
Tags containing text to be translated. When po4a-gettextize finds such a tag it will cut short the current text, if any, and put it in an msgid, and then start a new msgid entry with the text contained inside the tag to be translated.
(maybe that's a bug, should the po4a-gettextize be different when inside #PCDATA stuff? It would make sense to ignore all tags in that case in DocBook, maybe not in other dtds?)
This means inline tags must not be put in this category as it would otherwise cut text mid-sentence, making them impossible to translate.
Difference with indent???

* section
Tags that cannot contain text to be translated but can contain other tags.
I believe these tags cannot appear inside tags that are translatable, such as tags in the translate and indent categories. What's the point?

* indent
Indented tags? Indented where? What's this indentation for? Difference with section and especially translate?

* ignore
These tags are ignored and left in the stream of text to be translated. Inline tags must be put in this category so that they don't cause po4a-gettextize to split text mid-sentence.
If an ignore tag is not in an inline context, then if its parent is in the translate or indent category then it will be put straight in the stream of text to be translated. Otherwise?

*** mquinson answered ***
Ok, I'll try to document this in the Sgml.pm documentation. It will give us a base of discution ;)

In short, translate=indent in the current code (and I don't remember why I introduced indent); section is different so that the source of the translated document looks better.

But this module is so old that I can recall wrong ;)

----------------------------------------------------------------------

>Comment By: Denis Barbier (barbier-guest)
Date: 2010-07-29 15:14

Message:
I agree with Jordi, and as Locale::Po4a::Docbook is there now (well, for a long time already), I am closing this bug.

----------------------------------------------------------------------

Comment By: Jordi Vilalta (jvprat-guest)
Date: 2004-08-09 23:31

Message:
Logged In: YES 
user_id=10156

In the Xml module, the same tag can be inline (it can be conditional, depending on into which tag it is) and translatable (if it isn't inside another translatable tag).

Also, the Xml module handles all the tags as they were in the "ignore" category, and then you say which ones you want to treat in a different way.

I think this can be enough in this case.

If true, when a Xml-derived DocBook module appears, this bug could be closed.

----------------------------------------------------------------------

You can respond by visiting: 
https://alioth.debian.org/tracker/?func=detail&atid=410622&aid=300622&group_id=30267



More information about the Po4a-devel mailing list