[Pkg-crosswire-devel] Miscellaneous responses to various threads

DM Smith dmsmith555 at yahoo.com
Mon Jan 26 21:54:20 GMT 2009


Please pardon me for not replying to threads individually. I've only 
recently joined and much of what I say pertains to emails not in my inbox.

Regarding this effort:
Many, many thanks!

Regarding ICU:
osis2mod and tei2mod both require it for proper building of UTF-8 texts. 
Specifically, it will convert cp1252 to UTF-8 and then normalize that to 
NFC. It is quite possible to do this before running these command line 
apps using uconv (part of ICU) or the equivalent on the text.

The SWORD lib (and it is properly branded in upper case) has one place 
where ICU is critical. If ICU is not present SWORD's upper case string 
converter will use an ASCII upper case routine (e.g. that from ctype) as 
a fallback, otherwise it will use a UTF-8 aware ICU routine. There are 
at least two places that this makes a difference:
1) The rendering of <divineName>Lord's</divineName> into all upper-case 
where the ' is a non-ASCII character. Without ICU, it breaks.
2) Dictionaries are keyed and indexed on upper case words. When a user 
selects a lower case word for lookup, it is uppercased. If it has 
non-ASCII characters, it breaks.

Regarding e-texts:
The SWORD library is backward compatible with e-texts. It is not forward 
compatible. Modules may have features that don't work for earlier 
engines and break them. (I can give specific examples if needed/wanted.) 
For this reason, the module confs have a minimum SWORD version field, 
MinimumVersion. If I am not mistaken, the front-ends' install managers 
take this into account warning and/or preventing the user from 
installing modules that are incompatible with the SWORD engine. But, 
IIRC, once the module is installed, this field is not used. I think that 
modules may be another place that having libsword6 and libsword7 
co-exist will be problematic.

Other fields which should be considered are that of:
Obsoletes  -- names a module that is replaced by this one
Font  -- names a font which may be necessary for proper viewing of the text

For a complete definition of the conf see: 
http://crosswire.org/wiki/DevTools:confFiles

Another issue is that of copyrights and licenses. We at CrossWire have 
made the best attempt to properly obtain permission for copyrighted 
material and to properly classify public domain texts. We simply won't 
distribute copyrighted texts without written permssion. We have been 
wrong at times. The classic example is a Portuguese text which was 
inappropriately named and dated, perhaps to obscure its ownership, and 
thus labelled as public domain. When it was discovered that it was under 
current copyright, we immediately took down the text. Our rigid stance 
on this has helped us to successfully negotiate other texts. CrossWire 
does not expect to have to let any other distribution know that a module 
needs to come down. The mere fact that it is gone from the CrossWire 
repository should be sufficient. While we at CrossWire are willing to 
bear that risk of making such a mistake, I don't recommend it to anyone 
else. Especially one with a derived income stream. For any Linux 
distribution to distribute a wide range of modules they would need to be 
ready to remove them quickly on such a condition.

Regarding modules with which there is no problem, I recommend KJV, 
StrongsGreek, StrongsHebrew and Robinson. Perhaps Greek and Hebrew 
modules. But this is because, I use SWORD to do biblical research. For 
daily reading, I'd personally never recommend the KJV. I'd recommend 
texts that are under current copyright and for which CrossWire has 
permission to distribute.

Regarding indexing:
SWORD has not come up with a versioning mechanism for Lucene indexing. 
This is a two fold problem. The first relates to the version of Lucene 
being used. Lucene core has a strict backward compatibility core 
implementation, such that 4.x can read 2.x indexes but possibly not 1.x 
indexes. Also, within 2.1-2.9, the index is backward compatible. That is 
an index built under 2.9 can be used by 2.0. The clucene project is 
stuck on a very old version of Lucene, 1.4.3, I think.

The other is that of how the SWORD engine uses Lucene. Adding new fields 
would require some kind of versioning mechanism.

For this reason, indexes won't be pre-built and distributed as part of 
the module, but will need to be built on a per module per install basis.

Regarding /usr/share/sword
IMHO, the biggest breakage regarding /usr/share/sword is that unless it 
is writeable one cannot use Lucene to index modules held there. Lucene 
indexing is awesome. When we get problem reports regarding 
/usr/share/sword at CrossWire, we recommend to the user to log on as 
root and change permissions to open it up, or to delete the module using 
the distribution's package manager and re-install it using the SWORD 
installer.

Hope this is helpful.

In Christ,
    DM






More information about the Pkg-crosswire-devel mailing list