[Dict-common-dev] UTF-8 and ispell

Rafael Laboissiere rafael at debian.org
Sat Sep 29 09:03:02 UTC 2007

* Paul Boekholt <p.boekholt at gmail.com> [2007-09-29 10:09]:

> I'll update the modes upstream, but probably not this weekend. I'll probably
> add a check if aspell is installed, also I have to update the documentation.

That's great, thanks.

> 2007/9/28, Agustin Martin <agustin.martin at hispalinux.es>:
> > This means we need a versioned conflict on jed-extra in dictionaries-common.
> > Another possibility is try adding a fake definition for the aspell adding
> > function in case is not defined, but I do not know if this is possible and
> > is probably an overkill. What do you think is better?
> I think this can be solved with
> #ifexists aspell_add_dictionary
>  ...
> #endif

Oh yes, this is probably a better idea.  I will implement it in

> > I have been looking at the code and seems OK.
> Here too. Except one thing: in the aspell handling code, you've added
>     $otherchars =~ s/^\[//;
>     $otherchars =~ s/\]$//;
> but not in the ispell handling code. I believe it's also needed there.

Sure.  I will do it for ispell too.

Agustín: should I commit my changes to the CVS repository of dict-common or
do you prefer to do it yourself?

> > However, when testing the resulting file I noticed that bulgarian
> > aspell dict uses \xxx octal chars which are not translated.
> Does the Bulgarian aspell dict work in Emacs? I don't have the dict
> installed, also aspell doesn't work for me with Emacs - I get
> ispell-init-process: Can't open /usr/lib/ispell/en_GB.hash
> This is on Etch, I think this was fixed (see Bug #435545)
> If it works in Emacs, does it work by passing the "\xxx" string unchanged
> from the info file into the .el file? Does Perl understand "\xxx"
> strings? S-Lang does, but only in non-utf8 mode. To get a string that
> works out to the same unicode characters both in utf-8 and ascii mode,
> you need to use "\x{FF}" hexadecimal constructs.

Yes, Perl understands "\xxx" escape sequences in strings where "xxx" is an
octal number [1].  However, this does not help us here because when parsing
the info-aspell file, DictionariesCommon.pm sees the string as ASCII, i.e.
containing the "\" and [0-7] characters.

[1] http://perldoc.perl.org/perlreref.html#ESCAPE-SEQUENCES

I think that the maintainer of aspell-bg should provide a coherent
info-aspell file, I mean, if "Coding-System: cp1251" is declared in this
file, then all the *chars fields of the corresponding entry should be in
that encoding.  Should I file a bug report against aspell-bg?

At any rate, the strings in jed-ispell-dicts.sl are too long for aspell-bg
and ispell_init.sl fails here with the error message:

/var/cache/dictionaries-common/jed-ispell-dicts.sl:232: String too long for buffer: found '??'

Is this normal?


More information about the Dict-common-dev mailing list