[Dict-common-dev] UTF-8 and ispell

Rafael Laboissiere rafael at debian.org
Sat Sep 29 11:17:43 UTC 2007


* Paul Boekholt <p.boekholt at gmail.com> [2007-09-29 12:43]:

> That sounds like a problem. I guess the string is longer than 256 characters.
> >From the S-Lang manual:
> Although there is no imposed limit on the length of a string, string
>   literals must be less than 256 characters in length.  It is possible
>   to construct strings longer than this by string concatenation, e.g.,
> 
>              "This is the first part of a long string"
>               + " and this is the second part"
> 
> Since DictionariesCommon generates S-Lang code, this limitation applies.
> 
> Broadly, there are three ways to fix this:
> - catch the "String too long" error in a catch block. Tough luck for
> Bulgarian speakers.
> - Split the string up, as suggested in the manual. An example of how to do
> this can be found in the autotext.sl mode.
> - Instead of generating S-Lang code, generate some data file and provide a
> S-Lang script to parse those data. One way to do this would be to
> generate XML and parse that with the expat module. Another way would be
> to store the data in a SQLite table, and in fact the next version of
> autotext.sl may do that. Or maybe the readascii.sl library provided with
> slsh can be used for this. Note that if the string isn't sourced by
> S-Lang, you don't get the "\x{__}" substitution.
> 
> I think I'd go for the second option.

I would go for a simpler solution, as I wrote previously in this thread:
just ask the aspell-bg maintainer to convert the info-aspell file to the
national character encoding.  Perhaps, we should also enforce this through
the dictionaires-common Policy.

I do not think that we would need more than 256 characters in a string in
jed-ispell-dicts.sl.  If that happens in the future, I would go for the
second option as you suggested.

-- 
Rafael



More information about the Dict-common-dev mailing list