[Po4a-devel] error parsing document header
D. Barbier
bouzim at gmail.com
Thu Sep 27 13:05:30 UTC 2012
On 2012/9/27 David Prévot wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA256
>
> Hi,
>
> Le 27/09/2012 07:55, D. Barbier a écrit :
>
>> Indeed, this is due to accented characters.
>> It seems that length() returns the number of bytes and not characters.
>> I looked at Unicode issues with Perl a very long time ago and do not
>> remember about its quirks; if anyone has a clue, please tell ;-)
>
> Thomas, CCed, helped us a lot for the DPNhtml2mail script [0], and
> managed to make that work.
>
>> 0: http://anonscm.debian.org/viewvc/publicity/dpn/scripts/DPNhtml2mail.pl?view=co
>
> I guess the magic operates in the end of the following code:
>
> # number of column of a string
> sub _columns {
> my $str = scalar shift;
>
> return 0 if ( !defined $str || $str eq '' );
>
> $str = decode_utf8($str) unless utf8::is_utf8($str);
> return Unicode::GCString->new($str)->columns();
> }
Thanks David,
This seems to be different, you are computing the string width whereas
I need the number of characters.
I believe that all we need is to add some ":encoding(foo)" flag when
opening file for reading, encoding must be specified and is thus
known.
Denis
More information about the Po4a-devel
mailing list