UTF-8 and ispell

Paul Boekholt p.boekholt at gmail.com
Sat Sep 22 12:52:18 UTC 2007


2007/9/22, Rafael Laboissiere <rafael at debian.org>:
> > Ispell seems to count bytes here. So it looks like ispell.sl doesn't
> > work with ispell in utf-8 mode ATM.
>
> It works fine here, provided that an appropriate entry is included in
> jed-ispell-dicts.sl, for example:
>
>     ispell_add_dictionary (
>       "german-new8-utf8",
>       "ngerman",
>       "ÄÖÜäößü",
>       "[']",
>       "~utf8",
>       "-C -d ngerman");
>
> (Note that the accented characters must be in UTF-8 encodning.)

I think it will work some of the time, but not all the time.

ispell -d ngerman -T utf8 -a
@(#) International Ispell Version 3.1.20 10/10/95, patch 1
schon guut
+ SCHONEN
& guut 3 6: Glut, Gurt, gut

schön guut
*
& guut 3 7: Glut, Gurt, gut

As you can see, the offset is 6 in the first case, but 7 in the second case.
Ispell counts bytes, while aspell counts characters.

Ispell.sl uses go_right() which counts characters, so it works with aspell but
not with ispell. So to get it to work with ispell, we would first need to
get John to add a go_right_bytes() function. And then ispell.sl would need
to be made aware of whether it's using ispell or aspell. Of course it
already is because there's the variable Ispell_Program_Name, but e.g. Jörg
has made ispell a symlink to aspell, and other users may use all kinds of
wrappers.

> > Does anybody still use ispell? Why, when aspell is better?
>
> I still use ispell because I frequently call ispell_change_dictionary.  If
> there is a reasonable way of switching dictionaries in jed when using
> aspell, then I would use use aspell instead of ispell.  Could ispell.sl be
> changed in order to achieve that?  Maybe we shoudl add calls like:
>
>     aspell_add_dictionary (
>       "german-new8-utf8",
>       "ngerman",
>       "de_DE");
>
> and then, when Ispell_Program_Name is set to "aspell", call aspell like
> this: "aspell -a -l german-new8-utf8".  I could do the necessary changes to
> the dictionaries-common package in order to make this work in Debian.

I'm not sure what you're after here, so far I've used the same data
structures for dealing with both ispell and aspell and it's worked OK for
me. But if you use some languages that are supported by ispell, and others
that are supported by aspell, then it could be improved.

What's happening with aspell in Debian anyway? In
http://dict-common.alioth.debian.org/dsdt-policy.html it says

Because of the major changes in aspell 0.50 most aspell elements present in
previous versions of this policy are being removed. Specifically
pspell-ispell support is no longer available and pwli files are no longer
used. Some way of coordinating ispell and aspell use under emacs is being
implemented (see the Section called Registering aspell dictionaries for use
from within emacs). Besides that, some reference to aspell might have been
left in the document, so be careful with them.

Is that document still up to date?



More information about the Pkg-jed-devel mailing list