[Po4a-devel] Feature request - segment at sentence level

Karol Krenski mimoohowy at gmail.com
Tue Feb 12 00:55:49 UTC 2013


OK, so I wrote sentence segmentator on top of po4a. It works this way:

1. po4a-gettextize creates the initial .po with paragraph-level
segments.

2. The attached po4a-segment.pl -S(plit) splits the paragraphs
(currently by dot+space, but abbrevs and rules should be added). Each
sentence belongs to a block, so that it can be merged back to
paragraphs at the end.

3. Tranlators work with this sentence-level file. My wife is a
translator and don't tell me sentence is a poor context - you guys
were just lazy to code the sentence splitter :)

4. Once the translators are done, the po4a-segment.pl -M(erge) is run
to revert to original po4a paragraph segments.

Quick tested with a libreoffice file (attached 1.odf with content.xml
inside) and latex. Seems to work for the attached file - I am able to
split, translate, merge back to paragraphs and at the end succesfully
po4a-translate! :)

The code is well documented and I hope to maintain the code if you
merge it with po4a package. Of course there are features missing, like
more rules for sentence splitting, picking proper latex (and other
formats) environments which can be translated, encodings, etc. But
basic work is done and if it's accepted I can work more on that.

-- 
Karol Kreński
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 1.odf
Type: application/vnd.oasis.opendocument.formula
Size: 14775 bytes
Desc: not available
URL: <http://lists.alioth.debian.org/pipermail/po4a-devel/attachments/20130212/34fe2ca6/attachment.odf>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: po4a-segment.pl
Type: application/x-perl
Size: 4145 bytes
Desc: not available
URL: <http://lists.alioth.debian.org/pipermail/po4a-devel/attachments/20130212/34fe2ca6/attachment.bin>


More information about the Po4a-devel mailing list