[Po4a-devel]Some comments

Jordi Vilalta jvprat@wanadoo.es
Fri, 7 May 2004 15:43:38 +0200 (CEST)


Hi,

when I tried the last CVS version I got this:

$ perl Makefile.PL
Checking if your kit is complete...
Warning: the following files are missing in your kit:
        t/data-20/text.xml
        t/data-20/xml.po
Please inform the author.
Writing Makefile for po4a

So here's the report.

On Tue, 27 Apr 2004, Martin Quinson wrote:
> On Wed, Apr 21, 2004 at 11:44:34PM +0200, Jordi Vilalta wrote:
> > On Wed, 14 Apr 2004, Martin Quinson wrote:
> > > On Tue, Apr 13, 2004 at 08:03:51PM +0200, Jordi Vilalta wrote:
> > >
> > > [skip intro]
> > > 
> > > > Recently I tried the po4a-gettextize with a DocBook XML document that I'm
> > > > starting to write and I found some problems. 
>  
> [...]
> 
> > > > Then I deleted the first line, which is xml specific:
> > > > <?xml version="1.0"?>
> > > > and this error disappeared. Apart from this line, the rest is a valid SGML 
> > > > DocBook. Would there be an easy way to auto-detect and bypass it? (Then 
> > > > I think you could officialy say that po4a supports DocBook XML documents 
> > > > at the same level as DocBook SGML)
> > > 
> > > I just commited a fix to the CVS. If this line is found when trying to use
> > > the SGML backend, it will write the following warning: 
> 
> [...]
> 
> > It detects if there is that line, but now it seems it fails when 
> > searching for the DTD:
> > 
> > File kk.xml have an unknown DTD
> > Supported for now: debiandoc, docbook.
> > 
> > It may be that it is still watching at the first line of the file, not the 
> > second.
> 
> Erm, I prefer not to detail what exactly it looks for ;) Let's say that the
> CVS version is "a bit broken". Let me commit what I have locally, and it
> should do the trick.

Now it works as expected :)

> 
> > Here's what I've tried:
> > ...
> > <!ENTITY aaa "aeiou">
> > ]>
> > ...
> > <para>
> > &aaa;
> > </para>
> > ...
> > 
> > and in the po generated by po4a-gettextize:
> > 
> > # type: <para></para>
> > #: kk.xml:26
> > msgid "&aaa;"
> > msgstr ""
> 
> And you got nothing, right? 
> 
> po4a skips the generation of msgid containing an entity only (or tags only).
> It will now issue a warning when such optimizations are done. Thanks for the
> repport. [At least this is what I planned, but the msgid containing spaces
> along with entities where not detected. This is also fixed]

Now it seems to skip this kind of msgids (the version I tried some days 
ago didn't), but it has an irregular behavior. I've done the following 
(meaningless) test:

...
<!ENTITY chap SYSTEM "chapter1.xml">
<!ENTITY chap2 SYSTEM "chapter2.xml">
<!ENTITY aaa "contens of aaa">
<!ENTITY bbb "contens of bbb">
<!ENTITY ccc "contens of ccc">
]>

<book>
        &chap0;
        &chap;
        &chap2;
        &aaa;
        &chap3;
        &bbb;
        &chap;
        &ccc;
        &aaa;
</book>

and in the generated po, this part appears as:

# type: </chapter><chapter>
#: chapter2.xml:30
msgid "&aaa; &chap3; &bbb;"
msgstr ""

# type: </chapter></book>
msgid "&ccc; &aaa;"
msgstr ""

The type line is... pretty weird. It seems like it puts the last two tags 
it has found. I don't know how it's treated, but i think it should be in a 
stack-like manner, pushing the opening tags and poping the closing ones. 
This could also be great to notify badly-structured documents. (I don't 
know if it does currently)
A stack-per-file could be great, so that the messages in the included 
files don't contain tags external to them.
In the example above I think the correct type would be <book>.

The reference line says it's from the included file, but it should say 
it's in the main file instead.

When watching the contens of the msgids, it seems that it skips only the 
inclusion entities that it knows, and gives the "substitution" entities 
up:

        &chap0;
     -> &chap;
     -> &chap2;
        &aaa;
        &chap3;
        &bbb;
     -> &chap;
        &ccc;
        &aaa;

and it treats the rest as 3 fragments alone. The first one (&chap0) 
doesn't appear, because it's like a 1-entity message. The second and the 
third appears.

I think there are 2 alternative ways to treat these cases better:
  1) Exclude all entities-only messages (any number, known or unknown)
  2) Include the whole messages that have more than 1 entity (known or 
     unknown), because in some languages it may be interesting to change 
     the order of some of them.

hmmm, now I was thinking about the standard entities that define special 
characters, as &acute; and I've seen that they're also excluded if there's 
something like <title>&Acute;</title>. Seeing this, I prefer not to 
exclude any entities. In some cases it can be a little annoying for the 
translators, but else, there could be some untranslateable strings.

> 
> It was furthermore impossible to translate the content of the entity because
> it was not implemented. This will be fixed as soon as I find the time to
> commit my local version into the CVS. At least, I hope so ;)

Wow! Thanks, it's a great improvement :D I've only done small tests on 
this, but it seems to work well ;)

> 
> > > > The inclusion entities are also important, but if we could treat each file 
> > > > alone it would be good enough for now :)
> > > 
> > > Inclusion entities are handled... What was your problem when using them?
> > 
> > Well, last day I only tried that with included files that had entities 
> > including other files... It wasn't handled. I used the following
> > construction:
> > 
> > ...
> > <!ENTITY % common SYSTEM "common.ent">
> > %common;
> > ]>
> > 
> > I don't know if it's somewhat strange. The DocBook parser accepts it. 
> > The common.ent file has only entities, which are extended in the main 
> > document definition.
> 
> If that's legal (ie, if nsgmls accepts it, I'll have to accept it. Again,
> please fill a bug about this (another one). Do not forget to attach a
> example file being valid, but refused by po4a. For example, are you sure
> that it's %common; and not &common; ?

nsgmls accepts them, and it parses the included files (it has given some 
errors in the included files ;)
The %something; entities are a different kind. I don't know all their 
properties, but they're expandable into the doctype header (and standard 
entities aren't). I'll fill the bug in a while.

Regards,

Jordi Vilalta