[Freedombox-discuss] libferris for index and search on the freedombox

monkeyiq monkeyiq at gmail.com
Sun Mar 25 06:02:11 UTC 2012


Hi,
  Apologies if this message is off topic. I have made deb files for
libferris for the freedombox available at [1]. This build should include
the recent optimizations for regular expression evaluation as detailed
at [2]. I suspect these benchmarks will become better with a
stripe/strided ext4 or other filesystem tailored directly to the sdcard
chip at hand. There are command line, web, virtual filesystem, and
mobile app [3] interfaces to search these indexes.

Thoughts are very welcome. Either here, on the libferris mailing lists,
or directly to me.

Default metadata and full text indexes can be setup and linked using the
following:

freed $ cat clucene-remake.sh
#!/bin/bash
cd ~/.ferris
rm -rf ~/.ferris/ea-index
rm -rf ~/.ferris/full-text-index
mkdir ea-index
ln -s ea-index full-text-index
cd ea-index
fcreate --create-type fulltextindexclucene --rdn=ftxindex `pwd`
fcreate --create-type=eaindexclucene --rdn=eaindex db-exists=1 `pwd`
feaindex-attach-fulltext-index -P `pwd` -F `pwd`

There are some articles on the Web already explaining why I separated
the metadata (Extended Attribute EA) indexing and full text (english
content of a file) into different abstractions. The above links the two
at creation time allowing mixed metadata and fulltext querying.

The index can be populated with the following. Note that I use split on
the find output to keep process sizes and indexing runtimes down. The
EXPLICIT_WHITELIST is a comma separated list of the metadata to include
in the index from your filesystem. More information (like the width of
jpeg files) can be added by will slow down the indexing parse. Of course
you can index different subsets of files with different collections of
metadata as appropriate.

freed $ cat clucene-update.sh 
#!/bin/bash
export
LIBFERRIS_EAINDEX_EXPLICIT_WHITELIST=name,size,mtime,mtime-display,atime,ctime,user-owner-name,group-owner-name,user-owner-number,group-owner-number,inode
TMPDIR=~/tmp
EAIDXPATH=~/.ferris/ea-index

rm -rf $TMPDIR/*
mkdir -p $TMPDIR

cd $TMPDIR
find /usr | split -l 5000 -  usr.split.
find ~    | split -l 5000 - home.split.
find /etc | split -l 5000 -  usr.split.

cd $EAIDXPATH
rm -f write.lock 
for if in $TMPDIR/*split.*
do
   echo "Processing $if"
   cat $if | feaindexadd -P `pwd` -1
done


cd $TMPDIR
find /usr/share/doc | split -l 5000 - ftxusrdoc.split.
cd $EAIDXPATH
export LIBFERRIS_INDEX_NO_REMOVE=1
rm -f write.lock 
for if in $TMPDIR/ftx*split.*
do
   cat $if | findexadd -P `pwd` -1 -v
done



[1] http://fuuko.libferris.com/
[2]
http://monkeyiq.blogspot.com.au/2012/03/libferris-in-512mb-ram-on-arm5-at-12ghz.html
[3]
http://monkeyiq.blogspot.com.au/2012/03/ferris-on-n9-search-by-url-content-and.html





More information about the Freedombox-discuss mailing list