[Po4a-devel] Several how-to questions on XML files translated with PO4A

Raphaël Maville rafmav at wanadoo.fr
Mon Nov 24 15:46:04 UTC 2008


Sorry, I am speaky, and I already thank you if you do reply or not!


------


For the writers and maintainers of some documentation files written un
XML, there are some helpful carriage returns inside them to ease viewing
the XML source in editors (both text and XML editors).

And of course, po4a consider the messages as different!

Example:
   # type: Content of: <chapter><sect1><sect2><para><guilabel> 
   #: guide/C/ch_basics.xml:278 
  #, no-wrap 
  msgid "" 
  "Transaction\n" 
  "      Journal" 
  msgstr ""

  # type: Content of:
<chapter><sect1><sect2><itemizedlist><listitem><para><guilabel>
  #: guide/C/ch_basics.xml:1324
  #, no-wrap
  msgid "Transaction Journal"
  msgstr ""

In this case, I wanted these sentences to be considered as the same, and
grouped like this:

  # type: Content of: <chapter><sect1><sect2><para><guilabel> 
  #: guide/C/ch_basics.xml:278 
  # type: Content of:
<chapter><sect1><sect2><itemizedlist><listitem><para><guilabel>
  #: guide/C/ch_basics.xml:1324
  #, no-wrap
  msgid "Transaction Journal"
  msgstr ""


Question: how to and is it possible to "auto-remove" these carriage
return while creating the Pot and Po files, only with PO4A ? I mean,
without modifying the XML original source before translation...


------


The XML files contain text tags into text tags, and the file is parse in
several msgid and msgtr at each new "text" tag inside a "text" tag; 
(Long) Example:
   # type: Content of: <chapter><sect1><sect2><para>
   #: guide/C/ch_basics.xml:180
   #, no-wrap
   msgid "An"
   msgstr ""

   # type: Content of: <chapter><sect1><sect2><para>
   #: guide/C/ch_basics.xml:180 guide/C/ch_basics.xml:305
   #, no-wrap
   msgid "account"
   msgstr ""

   # type: Content of: <chapter><sect1><sect2><para>
   #: guide/C/ch_basics.xml:180
   #, no-wrap
   msgid ""
   "is a place for keeping track of\n"
   "      what you own, owe, spend or receive. Although you only have
one main\n"
   "      data file, that file will contain many accounts. You probably
already\n"
   "      think of money you own or owe as being in an account. For
example, at\n"
   "      some point you opened checking and savings accounts at a
particular\n"
   "      bank, and that bank sends you monthly statements showing how
much money\n"
   "      you"
   msgstr ""

   # type: Content of: <chapter><sect1><sect2><para><emphasis>
   #: guide/C/ch_basics.xml:186
   #, no-wrap
   msgid "own"
   msgstr ""

   # type: Content of: <chapter><sect1><sect2><para>
   #: guide/C/ch_basics.xml:186
   #, no-wrap
   msgid ""
   "in these accounts. Credit card accounts\n"
   "      also send you statements showing what you"
   msgstr ""

   # type: Content of: <chapter><sect1><sect2><para><emphasis>
   #: guide/C/ch_basics.xml:187 guide/C/ch_basics.xml:189
   #, no-wrap
   msgid "owe"
   msgstr ""

   # type: Content of: <chapter><sect1><sect2><para>
   #: guide/C/ch_basics.xml:187
   #, no-wrap
   msgid ""
   "to a\n"
   "      credit card company, and the mortgage company may send you
periodic\n"
   "      statements showing how much you still"
   msgstr ""

   # type: Content of: <chapter><sect1><sect2><para>
   #: guide/C/ch_basics.xml:189
   #, no-wrap
   msgid ""
   "on your\n"
   "      loan."
   msgstr """is a place for keeping track of\n"
"      what you own, owe, spend or receive. Although you only have one
main\n"
"      data file, that file will contain many accounts. You probably
already\n"
"      think of money you own or owe as being in an account. For
example, at\n"
"      some point you opened checking and savings accounts at a
particular\n"
"      bank, and that bank sends you monthly statements showing how much
money\n"
"      you"
msgstr ""

# type: Content of: <chapter><sect1><sect2><para><emphasis>
#: guide/C/ch_basics.xml:186
#, no-wrap
msgid "own"
msgstr ""

# type: Content of: <chapter><sect1><sect2><para>
#: guide/C/ch_basics.xml:186
#, no-wrap
msgid ""
"in these accounts. Credit card accounts\n"
"      also send you statements showing what you"
msgstr ""

# type: Content of: <chapter><sect1><sect2><para><emphasis>
#: guide/C/ch_basics.xml:187 guide/C/ch_basics.xml:189
#, no-wrap
msgid "owe"
msgstr ""

# type: Content of: <chapter><sect1><sect2><para>
#: guide/C/ch_basics.xml:187
#, no-wrap
msgid ""
"to a\n"
"      credit card company, and the mortgage company may send you
periodic\n"
"      statements showing how much you still"
msgstr ""

# type: Content of: <chapter><sect1><sect2><para>
#: guide/C/ch_basics.xml:189
#, no-wrap
msgid ""
"on your\n"
"      loan."
msgstr ""


In this example, you had to read, without the tags given for precision:

 <para>An <emphasis>account</emphasis> is a place for keeping track of
 what you own, owe, spend or receive. Although you only have one main
 data file, that file will contain many accounts. You probably already
 think of money you own or owe as being in an account. For example, at
 some point you opened checking and savings accounts at a particular
 bank, and that bank sends you monthly statements showing how much money
 you <emphasis>own</emphasis> in these accounts. Credit card accounts
 also send you statements showing what you <emphasis>owe</emphasis> to a
 credit card company, and the mortgage company may send you periodic
 statements showing how much you still <emphasis>owe</emphasis> on your
 loan.</para>

In the documentation from where comes the example, some tags contain
some text or some others tags; the tag <para> can contain the following
tags: <emphasis>, <quote>, <xref>, <guilabel>, <guibutton>, <guimenu>,
<guimenuitem>, etc.

The split of some sentences or paragraphs create several msgid/msgtr,
with these effects:
- some sentences are split but they are the same and they could be
translated one time for all!
- I translate to french, where often the ordre of the words is revert;
for example, say "un chat noir" (a cat black) for "a black cat" and the
split of the sentences and paragraphs get the translation hard!
[for the little story, it was impossible to use gtranslator nor poedit
to translate a documentation: once a msgid/msgtr is translated or marked
fuzzy, faulty..., it is sorted some where else in the translation list
or tree, and it is impossible to re-sort them based on the line
numbers...and editors get the risk to break the XML tags...I use Kbabel
instead, which is smarter with all that!]

Questions:
- Is it possible to keep the total paragraph <para> sentence in the same
msgid/msgtr ?
- If yes, how to do that ? from the command line configuration file
please!
- Will the tags got back after translation (emphasis or quoted words let
like they are but translated, etc...)
The documentation of po4a is clear for all, but it miss some examples to
understand how to write these options on the configuration file
(opt: ...); I mean, like those given in
http://po4a.alioth.debian.org/man/man3pm/Locale::Po4a::Xml.3pm.php 
Locale::Po4a::XML with the wrap and nostrip, etc. commands. Are thse
usable in a command line configuration file ?


------


Sometimes, some msgid/msgtr are grouped, but finally the translation is
different, depending of the chapter, section, paragraph, context,
sentence, etc.

Questions:
- Is it possible to split them into different msgid/msgtr ?
- When is the best moment ?
- Perhaps it is best to group them for a unique translation, and split
these back to different msgid/msgtr when it is needed; I think this is
probably not the problem of po4a but of the translation softwares.
- In this case, will PO4A respect this choice while translating or
updating ?


------


In fact, the text in the XML file is split "tag by tag", "paragraph by
paragraph" (<para> by <para>), but each paragraph contains some phrases,
some sentences, which are often the same in the documentation, or at
least contain some keywords, or more precisely some "group-of-keywords",
some "key-sentences" (a group of words which is always the same; here
are some examples: "File -> New", "Transaction Journal").
It can happen that the same exact sentence comes several times in the
documentation to translate; but the msgid/msgtr to edit are the full
paragraph! This is not usefull at all!

Questions:
- Is it possible to create a keywords list with PO4A ? Or is it an
external problem (kbabel) ?
- How to say to PO4A to split a paragraph into sentences ? Based upon
the period (full-stop, dot), colon, semi-colon, etc. ? But the sentences
can contain some periods that are not the end of a sentence (e.g. in
$4.5 or in U.N.). Sometimes, the periods are forgotten! It is also a
problem of "correct writing" for the writer of the documentation: they
should use the good punctuation, vocabulary, etc.

- another way should be an automatic "routine" to create such
"groups-of-keywords", beside or close to the PO-file creation; the
computer should compares all the document sentences and sub-sentences to
find out all the repeated words or groups of words... to do not repeat
and repeat their translation! After the creation of the msgid/msgtr and
inside PO4A or with an other soft ?







More information about the Po4a-devel mailing list