[Po4a-devel][RFC] Multi-lines verbatim blocks

Nicolas François nicolas.francois@centraliens.net
Wed, 17 Nov 2004 22:03:09 +0100


--NzB8fVQJ5HfG6fxh
Content-Type: text/plain; charset=iso-8859-1
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On Wed, Nov 17, 2004 at 01:17:12AM +0100, Denis Barbier wrote:
> On Mon, Nov 15, 2004 at 11:55:31PM +0100, Nicolas Fran=E7ois wrote:
> > Hello,
> >=20
> > To solve an issue with the man module, I've implemented a way to spec=
ify
> > some (multi-lines) verbatim blocks.
>=20
> I did not understand your message at first reading because I thought
> that 'verbatim blocks' were unformatted, maybe you should talk about
> 'untranslated blocks' instead.

I wanted to insist on the fact that these blocks are not only
untranslated, but also not touched by the parser (without rewrapping, or
any kind of reformating).

> > With my first try, I could correctly process 60 additional files (wit=
hout
> > even defining additional macros). 25 different blocks were used (each=
 one
> > defined in a file).
> >=20
> >=20
> > Everything is not so neat:
> >   - if the block appear in the middle of a paragraph, the block will =
be
> >     copied verbatim at a wrong place: at the beginning of the paragra=
ph
> >     (this could be fixed by adding a function to "flush" the parser).
> >     Most of the time, it is not a concern because the macros are defi=
ned
> >     in the header and it is up to the user to specify or not a verbat=
im
> >     block, but it should probably be fixed.
>=20
> IMO such blocks should split paragraphs into 3 parts: before, block
> itself, after.

I'm not really worried about notifying the parser. Do you have some ideas
for the user interface?
    - The user has to be notified in the po (it will probably be module
      specific), but something like a U<There is an untranslated block
      here> could be OK for the man module.
    - The user has to specify if the block ends a paragraph or not

> >   - it makes po4a slower (I don't think it's an issue, except for the
> >     testsuite;): each lines of the input document has to be compared =
with
> >     a line of each specified files (and lines have to be shifted/unsh=
ifted
> >     many times).
>=20
> Do not worry, I have some ideas to optimize regexes.  Do you have some
> large file which could be useful for benchmarking?

My first idea is to sort man pages on their size. Next time I will run th=
e
check script, I will take the cpu time used by po4a for each pages.
I will try to build an interesting  man page.

> >   - the user can do whatever he wants and it is, as the addendums, a =
quite
> >     complicated feature which can be error prone.
>=20
> I do not follow you here, my understanding was that blocks were copied
> from original file, so translators have almost no control here.

Forget about it:
I was only thinking of a scenario where the user is bored and add
something that should be translated to a "multi-lines untranslated block"=
.


> > Do you think it may be worth having such a mechanism in po4a?
> > Is there an interest for the other modules?
>=20
> Sounds like a very good idea, but maybe you should first explain in mor=
e
> details how you want to proceed.

Please find attached my current implementation (don't worry, I don't
intend to commit it as is).

It is not optimized at all (I'm reading all the little files for each
lines of the man page). It was done to test if the man pages were really
using always the same blocks.
I've put all "untranslated blocks" in different files in the same directo=
ry
(it's path is currently hardcoded).

BTW, is there a way to declare a function in Perl (I moved initialize at
the end because I needed \&translate_joined and \&untranslated).

I'm also adding some of the files I used with the man module.

> > Note:
> > For the man module, I'm also planing to add a verbatim and translate
> > option (like in the sgml module), which should allow to specify the
> > behaviour of the parser for these additional macros.
> >=20
> > Another Note:
> > I've not tried this, but it may solve #263298 (Please let -gettextize=
 know
> > about addendums and remove them automatically).
>=20
> This would be really cool, I am converting man pages of manpages-fr by =
hand,
> and it is quite boring ;)

If you need the verbatim and translate options, I can push them quickly i=
n
CVS.

--=20
Nekral

--NzB8fVQJ5HfG6fxh
Content-Type: text/plain; charset=us-ascii
Content-Disposition: attachment; filename="Man.pm.patch"

--- Man.pm.orig	2004-11-17 19:48:11.000000000 +0100
+++ Man.pm	2004-11-17 21:59:30.000000000 +0100
@@ -246,8 +246,6 @@
 # .    - a single char (e.g. B, I, R, P, 1, 2, 3, 4, etc.)
 my $FONT_RE = "\\\\f(?:\\[[^\\]]*\\]|\\(..|[^\\(\\[])";
 
-sub initialize {}
-
 #########################
 #### DEBUGGING STUFF ####
 #########################
@@ -275,6 +273,46 @@
         # end of file
         return ($line,$ref);
     }
+REIGN:
+my $dir='~/sources/po4a/cvs/po4a/testsuite/ignore/';
+opendir DIR, $dir;
+while (my $file= readdir DIR) {
+    if (-T "$dir/$file") {
+    open(IGNORE, "$dir/$file")
+        or die "Can't open file";
+    my @lines=<IGNORE>;
+    close(IGNORE);
+    my @tmp=(); # Keep the shifted strings in case they doen't match the file
+    my $test=shift(@lines);
+    while (defined $line and defined $test and $test eq $line) {
+        push(@tmp, ($line,$ref));
+        ($line,$ref) = $self->SUPER::shiftline();
+        $test = shift(@lines);
+    }
+    if (defined $test) {
+        # restore the lines
+        while (@tmp) {
+            $self->SUPER::unshiftline($line, $ref);
+            $ref = pop(@tmp);
+            $line = pop(@tmp);
+        }
+    } else {
+        print "<$file\n";
+        while (@tmp) {
+            # This part is just copied as is, no translation needed
+            my ($l,$r)=(shift(@tmp),shift(@tmp));
+            $self->pushline($l);
+        }
+        if(! defined $line) {
+            # both reached the end
+            return ($line,$ref);
+        }
+        goto REIGN;
+    }
+    }
+}
+
+
 
     # Handle some escapes
     #   * reduce the number of \ in macros
@@ -1436,3 +1474,40 @@
 ###        this from the generated manpage, and declare our own header
 ###
 $macro{'UC'}=$macro{'AT'}=\&untranslated;
+
+
+
+
+sub initialize {
+    my $self = shift;
+    my %options = @_;
+
+    $self->{options}{'translate'}='';
+    $self->{options}{'verbatim'}='';
+    $self->{options}{'debug'}='';
+
+    foreach my $opt (keys %options) {
+        if ($options{$opt}) {
+            die sprintf("po4a::sgml: ".dgettext ("po4a","Unknown option: %s"), $opt)."\n" unless exists $self->{options}{$opt};
+            $self->{options}{$opt} = $options{$opt};
+        }
+    }
+    if ($options{'verbatim'}) {
+        foreach (split(/,/, $options{'verbatim'})) {
+            print "verbatim'$_'\n";
+            $macro{$_}=\&untranslated;
+        }
+    }
+    if ($options{'translate'}) {
+        foreach (split(/,/, $options{'translate'})) {
+            $macro{$_}=\&translate_joined;
+        }
+    }
+    if ($options{'debug'}) {
+        foreach ($options{'debug'}) {
+            $debug{$_} = 1;
+        }
+    }
+}
+
+

--NzB8fVQJ5HfG6fxh
Content-Type: application/octet-stream
Content-Disposition: attachment; filename="ignore.tar.bz2"
Content-Transfer-Encoding: base64

QlpoOTFBWSZTWamyBG0ACnB/v+6yACB1//+Xvyf/7//v3/oQAASYYQAAkgQYVLAoCGAHj4F5
glFNa9htgAdcgASkhTyJ6mmTR6jR6J6TQAAGjQANGgBoGgABoaeJDgAAGgaGhoaZABoAANAN
DQAAMgAAgJQ9U2UM9SaaaAAMgxMJkGgBoaDAmINDBAAEmpJAmKep6ptTPUTaJhMIzSBtTCD0
CPQmA0J6EwmJpppo0OAAAaBoaGhpkAGgAA0A0NAAAyAAAqiECaExTaTCZI0emqe1T9QaaBGg
aaZ6k09TTTNEyaAAABqmICV3SkTGF5y48upaitzT6C2pQpZuBggJh2jiAigS1kTEEykJ4YhC
Edd1+7r4d1rptw0wpcuwvrfhfOcEVQRcIwYwRjjGN1r5xvurKt+GFKUpy4iNEQR0QRKIQkpE
INDJewVVVVYByO+xSlKUvynQffmHSOwPRPpmi6fPpSNkzShLt+W+Dc3Nzc98fkX/HN9PBx4Z
eC1H0M6bfNmbXtrfL4bl3mSZdVFojxYxH6UX0UeJ2fUsweequMOSTaG0tnVsd1LqJUj7GVvK
5+K5uyS3ZSCNIYI73SKsTECKNFGrIE5EJLiXQJQFB0BJxepsb83ivhvj4H5tM0ZDNuuhWDUV
RaSmK/aBVKtoTiWLFgyHCIl6q7S5S0l3LWrwus7I9W7KZrQlXeo8KbSlLKii5SkcjTxceh6z
P6crZSKR3XjcTxujgO91+Dr6/F3x0jJsZbm17+4omXW6VbnocTIRZg/GX+omMLfZd7tqUnLw
tRG2ri1LyWhcza2DU0O8ztMMpQnCdDK5x1bpc/iR7bzjjQ+86kKIjNs0EUrhYcFxchLAcupI
ifiBiMd7c6gqIMYxiMBJofnoSmKEaJLh+nFUJ+f6H25Eet3dmCEVvy1beKqsQfByGi2XNHCG
+IUZUWExypJvoEwCTW2OHdn3mrq4YGK/FUZPIJ1SpWuEuotIiyVFkUgiIiIiRCGUBkCiIJGW
grIjFFksWpQ25ahUmJK/Q8CkQ3piLJqmFBGv3t2vXRBHTqQCYjAGI3c3Aoum1FEQWJGMwYCQ
Cdz7Nn8u7rUp1O83F66VRwba8CdmBGQgd/JICXiLoCSkBF0QEmrU3pQ6Moy6E4nl93NxGnmR
MJJs1LHylwXX88CdrKmbtlVAUNOqvpnEzIdoZCLefa/CMoInZggi7gNca+56nT5de/a15s0Z
ynPWvX9uBMAWVgvxJRTpWpvrLqLTNn+MMgY+Vuuk9vsB20/VguDeNgLb0MZARTGOkIGkR8gm
TYiIEbklgC1543CMNN2CNMsa5F99c6zKNAi6FIItdgYKWZ0Uvsmb80EZX35RELTgjGuFYjGa
Rfci6rOForjEQuxvtOU3YCJrXjEYiKzW7C/ipdjwaMc2TDLFBF2+/CuV1kYVRytGts8dnq1s
bfFT6NXV6Czito6/4XuasQ2b1+tptpXr9VWQ2DkGxpYNG2w4ll5mcECjNIk9yguHywMjYSgd
gpFmJgYohAREp5PZfqBjoTbnsSu8mqvx/FTCd08CNZIoaezal5f0V4Va1uuyY8r6x9Sj7iG8
63M61H2FkSssrG2yj0/a+crPuoKpEXmcddG+azGpPyEpJSSkluv38Voh7HnX8crNm2a5WjWV
I2opCKfY6tuueqaXtOSKMkZEVQdD1zzc8EXe1e4e1SrsfD7OinlLkI+Yvep5Fx1e5Pzbku5H
ojgVTDBrYVMh2KQ9NaNuiB8jur4WhZENKSUkpRKUZsvZx3RrD0Xlb2b88H1ypfFFHGo9lK2r
hrfNGa0EcvnHz9PPg0ZRHobyGzPdqvrEpdCjuKunZpLMkckeZlHczpqHeWS2ERZ1oxRKJRKJ
RKJR0pSLrOVHMyjHB9TBZDcc9xMoqjoHbBHVBGbgYMs4I3Fp6GxENNY3vdjdHpU8k+SfXmO/
i+0cBCKoEbUiJGh5Knx5P1X61X5f94K0MEL78MFcr0ZYTCWZ3RCz1/G9NIt4mVLmPZ2MkPSI
Yvee5ee5486tbINbJj+gOTNp2mMYzSMe2qYy24ylwRsnWMLEYgry4TXMpemYzSoxLWP8NECd
M+GVw5QzYgWhKiZcD4uFk9hNkczkUWcaOZKMHu1gi9GKKsSz4Hw9vfTKKYONitOh3riZcLcq
/c4Re8Eb3FBFBgWOQoiIx4oZsTBxDTLuaJlO9zQ1V7UGiUI4JKS5zlSWYtzRimGiMkbF65qF
sb4I8GxuRiYPeY/1QRbnzsFIf+kZrkpT0X8s8ci5MTnqXIRxNm1x1mbjk4IqxLMWhqbWDvzN
yNLjsps/DXW1UcDmhSP+wRud1hL69R3kHS8LSvGmTfBU52MMSOVBEi5I1/S7UQh/8XckU4UJ
CpsgRtA=

--NzB8fVQJ5HfG6fxh--